[jira] [Updated] (LUCENE-7132) BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change)

Hoss Man (JIRA) Fri, 03 Jun 2016 18:43:53 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-7132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated LUCENE-7132:
-----------------------------
    Attachment: LUCENE-7132.patch


Ok, based on mike's clues, here are changes made in this new patch that (w/o 
mikes fix) help repro the bug more often...

{panel}
* TestBoolean2
** randomize usage of coord in all the queries tested
** randomly injects (empty) filler docs into the index to force matches to 
exist in diff buckets
*** this required some changes to queriesTest to know how to compute the 
correct expDocNrs to pass to CheckHits
** updated the "bigSearcher" assertions to also compare the results from 
multiple Scorers
*** i never saw this actaully make a diff in terms of test fail/success, likely 
because of how bigSearcher is really only tacking stuff on to the end of the 
index, but since test was already comparing multiple scorers for the main index 
to try and find situations where BulkScorer would return diff results from the 
default Scorer it seemed like a good idea to check here as well.
** added a "singleSegmentSearcher" that has a copy of the main index that has 
been forceMerged to a single segment
** updated queriesTest to also use CheckHits.checkHitsQuery to compare results 
from the main index with the singleSegmentSearcher
*** this was inspired by the fact that in previous testing forceMerge(1) was 
changing the scores of documents even though there were no deletions, and Mike 
couldn't explain that.
*** {color:red}This currently causes failures in some cases, even w/mikes fix 
applied - see below{color}
* replaced TestSimpleExplanationsWithHeavyIDF with 
TestSimpleExplanationsWithFillerDocs
** now the filler docs are injected in betwen existing docs
** dpeending on a random boolean, all filler docs are either:
*** empty -- so the queries from the super class tests aren't modified, just 
the expected docids
*** fill of terms used in the queries (to muck with IDF like in the previous 
patch) -- so the queries from the super class are also wrapped to exclude these 
filler docs
* BaseExplanationTestCase
** removed some no-longer needed refactoring that was in early patch
** added some randomization to wrap any query tested in a BooleanQuery with a 
"SHOULD" clause that matches nothing (to force more interesting coord cases)
* removed TestBooleanScoreConsistency and RajeshData.txt since they are no 
longer needed to reproduce he underlying bug easily

(NOTE: i experimented with injecting empty filler docs in TestMinShouldMatch2 
as well, but it slowd the test WAAAAY down -- ie: close to 5 minutes on my 
machine.   I'm guessing because of how it uses "SlowMinShouldMatchScorer" ... 
so i abandoned those changes.  Might be worth considering adding those back if 
someone sees a way to prevent it from being so dang slow)
{panel}

Except for the above note in red (about TestBoolean2 failing in some cases when 
comparing the hits+scores between the original index and a copy of that index 
merged down to a single segment) mike's fix resolved every failure I was able 
to generate with these test changes.

I suspect that either:
* I'm making some mistaken assumption in the validity of comparing scores 
between indexes like that (i don't think  i am based on mikes comments from IRC 
yesterday)
* There is a bug in my changes related to creating a "singleSegmentSearcher" 
that can be compared to "searcher" (extremely possible)
* There is still another, as yet undiagnosed, bug somwhere in 
BooleanQuery/BooleanScorer that causes discrepencies between otherwise 
equivilent indexes based on segment boundaries (seems plausible given mikes 
comments on IRC this morning that he couldn't think of any reason why the bug 
he identified would be affected by forceMerge)

Here's an example of a failing seed with the current patch -- note that based 
on the stack trace, both searchers produced a list of results that match the 
expected docids in the expected order, but the scores (for at least the first 
docid matched) are not equivilent...

{noformat}
   [junit4] <JUnit4> says Привет! Master seed: 1205E6391E501C49
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(28313@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBoolean2
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBoolean2 
-Dtests.method=testQueries10 -Dtests.seed=1205E6391E501C49 -Dtests.slow=true 
-Dtests.locale=zh-HK -Dtests.timezone=Australia/ACT -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] FAILURE 0.04s | TestBoolean2.testQueries10 <<<
   [junit4]    > Throwable #1: junit.framework.AssertionFailedError: Hit 0, doc 
nrs 2 and 2
   [junit4]    > unequal       : 0.5625806
   [junit4]    >            and: 0.42193544
   [junit4]    > for query:+field:w3 +field:xx +field:w2 field:zz
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1205E6391E501C49:E53B440260B039AE]:0)
   [junit4]    >        at junit.framework.Assert.fail(Assert.java:50)
   [junit4]    >        at 
org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:223)
   [junit4]    >        at 
org.apache.lucene.search.CheckHits.checkHitsQuery(CheckHits.java:205)
   [junit4]    >        at 
org.apache.lucene.search.TestBoolean2.queriesTest(TestBoolean2.java:196)
   [junit4]    >        at 
org.apache.lucene.search.TestBoolean2.testQueries10(TestBoolean2.java:329)
   [junit4]    >        at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=CheapBastard, 
sim=RandomSimilarity(queryNorm=true,coord=no): {field=DFR I(ne)LZ(0.3), 
field2=DFR I(n)2}, locale=zh-HK, timezone=Australia/ACT
   [junit4]   2> NOTE: Linux 3.19.0-51-generic amd64/Oracle Corporation 
1.8.0_74 (64-bit)/cpus=4,threads=1,free=281959656,total=315097088
   [junit4]   2> NOTE: All tests run in this JVM: [TestBoolean2]
   [junit4] Completed [1/1 (1!)] in 1.36s, 1 test, 1 failure <<< FAILURES!
{noformat}

But not every seed fails, example...

{noformat}
   [junit4] <JUnit4> says שלום! Master seed: 153FC5D5F31DB0CE
   [junit4] Executing 1 suite with 1 JVM.
   [junit4] 
   [junit4] Started J0 PID(28077@localhost).
   [junit4] Suite: org.apache.lucene.search.TestBoolean2
   [junit4] OK      0.10s | TestBoolean2.testQueries09
   [junit4] OK      0.04s | TestBoolean2.testQueries05
   [junit4] OK      0.03s | TestBoolean2.testQueries08
   [junit4] OK      0.04s | TestBoolean2.testQueries06
   [junit4] OK      0.02s | TestBoolean2.testQueries02
   [junit4] OK      0.02s | TestBoolean2.testQueries10
   [junit4] OK      0.04s | TestBoolean2.testQueries03
   [junit4] OK      0.01s | TestBoolean2.testQueries01
   [junit4] OK      0.01s | TestBoolean2.testQueries04
   [junit4] OK      0.01s | TestBoolean2.testQueries07
   [junit4] OK      3.68s | TestBoolean2.testRandomQueries
   [junit4] Completed [1/1] in 29.83s, 11 tests
{noformat}




> BooleanQuery scores can be diff for same docs+sim when using coord (disagree 
> with Explanation which doesn't change)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7132
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7132
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5
>            Reporter: Ahmet Arslan
>            Assignee: Steve Rowe
>         Attachments: LUCENE-7132.patch, LUCENE-7132.patch, LUCENE-7132.patch, 
> LUCENE-7132.patch, LUCENE-7132.patch, LUCENE-7132.patch, SOLR-8884.patch, 
> SOLR-8884.patch, debug.xml
>
>
> Some of the folks 
> [reported|http://find.searchhub.org/document/80666f5c3b86ddda] that sometimes 
> explain's score can be different than the score requested by fields 
> parameter. Interestingly, Explain's scores would create a different ranking 
> than the original result list. This is something users experience, but it 
> cannot be re-produced deterministically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-7132) BooleanQuery scores can be diff for same docs+sim when using coord (disagree with Explanation which doesn't change)

Reply via email to