OK I dug a bit, specifically on this test failure:
ant test -Dtestcase=TestBoolean2 -Dtests.method=testQueries01
-Dtests.seed=5787EE10A58E0A9C -Dtests.multiplier=3 -Dtests.slow=true
-Dtests.locale=nn-NO -Dtests.timezone=America/St_Vincent
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
and something else is at play: this particular test case uses
ConjunctionScorer, not BooleanScorer (where the original bug was).
What happens for this failing seed is the correct 2 documents match, but
the 2nd one unexpectedly gets a better score, possibly only when enough
filler docs were added. I think it's a poor test because it seems to rely
on the ClassicSimilarity valuing shorter document (5 vs 6 tokens) more than
a higher tf for term w3 (2 vs 1), which is bad. Really our tests should
not rely on specific scoring factors.
Net/net this seems like a test bug, but I'm not sure how to fix it.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Jun 17, 2016 at 6:05 AM, Michael McCandless <
[email protected]> wrote:
> I'll dig.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Thu, Jun 16, 2016 at 10:55 PM, Steve Rowe <[email protected]> wrote:
>
>> Thanks for looking Hoss.
>>
>> I compared files changed by the commits on branch_6x and on branch_5_5,
>> and I don’t see anything consequential, so I don’t think this is a case of
>> a misapplied backport.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Jun 16, 2016, at 6:25 PM, Chris Hostetter <[email protected]>
>> wrote:
>> >
>> >
>> > : : I ran this test before I committed the backport, but it succeeded
>> then.
>> > : : I beasted it on current branch_5_5 and 49/100 seeds succeeded.
>> > :
>> > : one of the things that cahnged as part of LUCENE-7132 was that i made
>> all
>> > : the BQ related tests start randomizing setDisableCoord() ... so you
>> might
>> > : be seeing some previously unidentified coord related bug that is only
>> in
>> > : the 5.x line of code?
>> > :
>> > : that could probably jive with the roughtly 50% failure ratio you're
>> > : seeing?
>> >
>> > Hmmm .... nope. Even with the setDisableCoord commented out (but still
>> > consuming random().nextBoolean() consistently) the same seeds reliably
>> > fail on branch_5_5
>> >
>> > Looks like the "~50%" comes from the "use filler docs or not?" bit of
>> the
>> > test? with the patch below i can't find any seeds to fail -- which
>> makes
>> > it seem like the crux of the original bug (results incorrect when docs
>> are
>> > in diff blocks) is still relevant even after the backport to branch_5_5.
>> >
>> > Mike -- any idea what might still be the problem here?
>> >
>> >
>> >
>> > -Hoss
>> > http://www.lucidworks.com/
>> >
>> >
>> > diff --git
>> > a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
>> > b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
>> > index d97d8d4..596eb64 100644
>> > --- a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
>> > +++ b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
>> > @@ -67,6 +67,7 @@ public class TestBoolean2 extends LuceneTestCase {
>> > public static void beforeClass() throws Exception {
>> > // in some runs, test immediate adjacency of matches - in others,
>> force a full bucket gap betwen docs
>> > NUM_FILLER_DOCS = random().nextBoolean() ? 0 : BooleanScorer.SIZE;
>> > + NUM_FILLER_DOCS = 0; // nocommit
>> > PRE_FILLER_DOCS = TestUtil.nextInt(random(), 0, (NUM_FILLER_DOCS /
>> 2));
>> >
>> > directory = newDirectory();
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>