Ruslan Torobaev created LUCENE-8380:
---------------------------------------

             Summary: UTF8TaxonomyWriterCache inconsistency
                 Key: LUCENE-8380
                 URL: https://issues.apache.org/jira/browse/LUCENE-8380
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/facet
    Affects Versions: 7.1
            Reporter: Ruslan Torobaev
         Attachments: lucene-taxonomy-cache-report.tar.gz, 
taxonomy-cache.json.gz, taxonomy.tar.gz

I’m facing a problem with taxonomy writer cache inconsistency. At some point in 
time UTF8TaxonomyWriterCache starts to return wrong ord for some facet labels. 
As result wrong ord are written in doc facet fields, and wrong counts are 
returned (undercount) during search. This bug is manifested on different 
servers with different index contents (we have several separate indexes with 
unique data). 
 Unfortunately I can’t reproduce this behaviour in tests. 
 I've dumped "broken" UTF8TaxonomyWriterCache instance and created app to load 
it and to compare with real taxonomy. Dumps and app are in attachment. To run 
demo extract archives content and exec:
{code}
mvn compile
mvn exec:java 
-Dexec.mainClass="me.torobaev.lucene.taxonomy.cache.TaxonomyCacheCheck" 
-DtaxonomyDir=../taxonomy/ -DcacheDump=../taxonomy-cache.json
{code}
As you can see, labels [frametype, 7] and [modification_id, 682] have same ord 
in cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to