[jira] [Updated] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues

Hoss Man (JIRA) Mon, 08 Jun 2015 15:42:19 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated LUCENE-6529:
-----------------------------
    Attachment: LUCENE-6529.patch

bq. Maybe BasePostingsFormatTestCase does not adequately exercise methods like 
size()/ord()/seek(ord). It should be failing!

FWIW, as far as i understand BasePostingsFormatTestCase and 
RandomPostingsTester based on skimming them this morning, they may not ever 
reproduce this bug since (AFAICT) only ever operate on single segment indexes?

As mentioned: this patch only ever fails for me when testing the 
SlowCompositeReaderWrapper -- asserts on the individual segment LeafReaders 
seem to pass all the time (even though one segment is forced to have every term 
that's in the index as a whole).  Likewise if you {{iw.forceMerge(1);}} then 
the SlowCompositeReaderWrapper asserts start to pass as well.

----

I've updated the patch to include the test from SOLR-7631, as well as beefing 
up UninvertingReader.tTetestSortedSetIntegerManyValues to include all (4) 
permutations of multi/single-valued + (no)-precisionStep, (didn't turn up 
anything unexpected, only the trie fields are problematic) as well as to 
running {{TestUtil.checkReader}} on the SlowCompositeReader before using it.  
This last change started triggering failure much earlier...

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestUninvertingReader -Dtests.method=testSortedSetIntegerManyValues 
-Dtests.seed=3A8A592786F36F30 -Dtests.slow=true -Dtests.locale=in_ID 
-Dtests.timezone=Zulu -Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.56s | 
TestUninvertingReader.testSortedSetIntegerManyValues <<<
   [junit4]    > Throwable #1: java.lang.RuntimeException: dv for field: 
trie_multi reports wrong maxOrd=33 but this is not the case: 30
   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([3A8A592786F36F30:DB56E81A1372E276]:0)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.checkSortedSetDocValues(CheckIndex.java:1917)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.checkDocValues(CheckIndex.java:1987)
   [junit4]    >        at 
org.apache.lucene.index.CheckIndex.testDocValues(CheckIndex.java:1790)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:318)
   [junit4]    >        at 
org.apache.lucene.util.TestUtil.checkReader(TestUtil.java:297)
   [junit4]    >        at 
org.apache.lucene.uninverting.TestUninvertingReader.testSortedSetIntegerManyValues(TestUninvertingReader.java:284)
{noformat}

...so for good measure, i sprinkled in {{TestUtil.checkReader}} in some of the 
other oal.univerting.* tests i could find using SlowCompositeReader -- but 
based on my limited beasting, this hasn't triggered any other failures.

(note: patch still has nocommits related to limiting some of the random 
variables)

----

bq. If i disable the ord-sharing optimization in DocTermOrds, all 3 seeds pass. 
So I think there is a bug in e.g. FixedGap/BlockTerms dictionary or something 
like that.

My inclination would be that we should remove this optimization for 5.2.1, 
commit these tests, and open a new issue to re-add the optimization if/when if 
can be done in such a way that these tests pass reliably.

what do folks think?


> NumericFields + SlowCompositeReaderWrapper + UninvertedReader + 
> -Dtests.codec=random can results in incorrect SortedSetDocValues 
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-6529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6529
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: LUCENE-6529.patch, LUCENE-6529.patch
>
>
> Digging into SOLR-7631 and SOLR-7605 I became fairly confident that the only 
> explanation of the behavior i was seeing was some sort of bug in either the 
> randomized codec/postings-format or the UninvertedReader, that was only 
> evident when two were combined and used on a multivalued Numeric Field using 
> precision steps.  But since i couldn't find any -Dtests.codec or 
> -Dtests.postings.format options that would cause the bug 100% regardless of 
> seed, I switched tactices and focused on reproducing the problem using 
> UninvertedReader directly and checking the SortedSetDocValues.getValueCount().
> I now have a test that fails frequently (and consistently for any seed i 
> find), but only with -Dtests.codec=random -- override it with 
> -Dtests.codec=default and everything works fine (based on the exhaustive 
> testing I did in the linked issues, i suspect every named codec works fine - 
> but i didn't re-do that testing here)
> The failures only seem to happen when checking the 
> SortedSetDocValues.getValueCount() of a SlowCompositeReaderWrapper around the 
> UninvertedReader -- which suggests the root bug may actually be in 
> SlowCompositeReaderWrapper? (but still has some dependency on the random 
> codec)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6529) NumericFields + SlowCompositeReaderWrapper + UninvertedReader + -Dtests.codec=random can results in incorrect SortedSetDocValues

Reply via email to