[
https://issues.apache.org/jira/browse/LUCENE-8595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711868#comment-16711868
]
Adrien Grand commented on LUCENE-8595:
--------------------------------------
I found the issue: the dedup logic is broken in case a value is both removed
and added to the same document in a single batch. Here is a patch:
{code:java}
diff --git
a/lucene/core/src/java/org/apache/lucene/index/DocValuesFieldUpdates.java
b/lucene/core/src/java/org/apache/lucene/index/DocValuesFieldUpdates.java
index 9bf9179..b0ad088 100644
--- a/lucene/core/src/java/org/apache/lucene/index/DocValuesFieldUpdates.java
+++ b/lucene/core/src/java/org/apache/lucene/index/DocValuesFieldUpdates.java
@@ -392,9 +392,13 @@ abstract class DocValuesFieldUpdates implements
Accountable {
}
long longDoc = docs.get(idx);
++idx;
- while (idx < size && docs.get(idx) == longDoc) {
+ for (; idx < size; idx++) {
// scan forward to last update to this doc
- ++idx;
+ final long nextLongDoc = docs.get(idx);
+ if ((longDoc >>> 1) != (nextLongDoc >>> 1)) {
+ break;
+ }
+ longDoc = nextLongDoc;
}
hasValue = (longDoc & HAS_VALUE_MASK) > 0;
if (hasValue) {
{code}
We have had this bug since we introduced the ability to reset values in
LUCENE-8298, recent changes just made this bug more visible: until recently you
had to update the same document via two different terms for this bug to occur.
> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded fails
> ----------------------------------------------------------
>
> Key: LUCENE-8595
> URL: https://issues.apache.org/jira/browse/LUCENE-8595
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: master (8.0)
> Reporter: Michael McCandless
> Priority: Major
>
> It does reproduce ... I haven't dug in:
>
> {noformat}
> [junit4] 2> NOTE: reproduce with: ant test
> -Dtestcase=TestMixedDocValuesUpdates
> -Dtests.method=testTryUpdateMultiThreaded -Dtests.seed=E079543483688908
> -Dtests.badapples=true -Dtests.loc\
> ale=mt-MT -Dtests.timezone=VST -Dtests.asserts=true
> -Dtests.file.encoding=US-ASCII
> [junit4] FAILURE 0.69s |
> TestMixedDocValuesUpdates.testTryUpdateMultiThreaded <<<
> [junit4] > Throwable #1: java.lang.AssertionError: docID: 63
> [junit4] > at
> __randomizedtesting.SeedInfo.seed([E079543483688908:4809171572AE9A81]:0)
> [junit4] > at
> org.apache.lucene.index.TestMixedDocValuesUpdates.testTryUpdateMultiThreaded(TestMixedDocValuesUpdates.java:526)
> [junit4] > at java.lang.Thread.run(Thread.java:745)
> [junit4] 2> NOTE: test params are: codec=Asserting(Lucene80):
> {id=PostingsFormat(name=LuceneVarGapFixedInterval)},
> docValues:{value=DocValuesFormat(name=Lucene70)}, maxPointsInLeafNode=13\
> 12, maxMBSortInHeap=7.5990910168370895,
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@e08c0f3),
> locale=mt-MT, timezone=VST
> [junit4] 2> NOTE: Linux 4.4.0-92-generic amd64/Oracle Corporation
> 1.8.0_121 (64-bit)/cpus=8,threads=1,free=446496544,total=514850816
> [junit4] 2> NOTE: All tests run in this JVM: [TestMixedDocValuesUpdates]
> [junit4] Completed [1/1 (1!)] in 0.83s, 1 test, 1 failure <<<
> FAILURES!{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]