[
https://issues.apache.org/jira/browse/LUCENE-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712958#comment-16712958
]
Michael McCandless commented on LUCENE-8592:
--------------------------------------------
Phew this is evil, nice catch [~jim.ferenczi]! Thank you for adding reverse
true/false testing through all the test cases. The patch looks good – moving
the {{reverseMul}} logic to after the comparison.
This may have high impact? Any time the user sorts by an int field, reversed,
and has missing values, their index will now fail {{CheckIndex}} on upgrade and
require reindexing from their original docs.
Whether this fix is buggy depends on the return values of e.g.
{{Integer.compareTo}}, {{String.compareTo}}, etc.? I.e. do those methods ever
return {{Integer.MIN_VALUE}}. Looking at [Integer.java in at least
JDK8|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/lang/Integer.java#l1233]
it seems we are good – it returns 0, 1, -1. And [String.java returns the
difference of two
{{chars}}|http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/lang/String.java#l1140].
Hopefully all other native {{compareTo}} impls are similar.
Should we add a test case confirming {{CheckIndex}} detects this bug? Create a
broken index and zip it up and commit that, with a test that unzips and
confirms {{CheckIndex}} fails on it?
Instead of the (int) casts e.g. in {{return (int)
globalOrds.get(readerValues.ordValue())}}, maybe we should switch to
{{Math.toIntExact}}? Because these are single values, it should never happen
that the long ord exceeds positive int space, so the conversion should always
work safely.
> MultiSorter#sort incorrectly sort Integer/Long#MIN_VALUE when the natural
> sort is reversed
> ------------------------------------------------------------------------------------------
>
> Key: LUCENE-8592
> URL: https://issues.apache.org/jira/browse/LUCENE-8592
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Jim Ferenczi
> Priority: Major
> Attachments: LUCENE-8592.patch
>
>
> MultiSorter#getComparableProviders on an integer or long field doesn't handle
> MIN_VALUE correctly when the natural order is reversed. To handle reverse
> sort we use the negation of the value but there is no check for overflows so
> MIN_VALUE for ints and longs are always sorted first (even if the natural
> order is reversed).
> This method is used by index sorting when merging already sorted segments
> together. This means that a sorted index can be incorrectly sorted if it uses
> a reverse sort and a missing value set to MIN_VALUE (long or int or values
> inside the segment that are equals to MIN_VALUE).
> This a bad bug because it affects the documents order inside segments and
> only a reindex can restore the correct sort order.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]