Ruslan Torobaev created LUCENE-8380:
---------------------------------------
Summary: UTF8TaxonomyWriterCache inconsistency
Key: LUCENE-8380
URL: https://issues.apache.org/jira/browse/LUCENE-8380
Project: Lucene - Core
Issue Type: Bug
Components: modules/facet
Affects Versions: 7.1
Reporter: Ruslan Torobaev
Attachments: lucene-taxonomy-cache-report.tar.gz,
taxonomy-cache.json.gz, taxonomy.tar.gz
I’m facing a problem with taxonomy writer cache inconsistency. At some point in
time UTF8TaxonomyWriterCache starts to return wrong ord for some facet labels.
As result wrong ord are written in doc facet fields, and wrong counts are
returned (undercount) during search. This bug is manifested on different
servers with different index contents (we have several separate indexes with
unique data).
Unfortunately I can’t reproduce this behaviour in tests.
I've dumped "broken" UTF8TaxonomyWriterCache instance and created app to load
it and to compare with real taxonomy. Dumps and app are in attachment. To run
demo extract archives content and exec:
{code}
mvn compile
mvn exec:java
-Dexec.mainClass="me.torobaev.lucene.taxonomy.cache.TaxonomyCacheCheck"
-DtaxonomyDir=../taxonomy/ -DcacheDump=../taxonomy-cache.json
{code}
As you can see, labels [frametype, 7] and [modification_id, 682] have same ord
in cache.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]