Ruiqi Dong created CALCITE-7554:
-----------------------------------
Summary: NlsString.compareTo() makes TreeSet/TreeMap collapse
distinct values with different charset or collation metadata
Key: CALCITE-7554
URL: https://issues.apache.org/jira/browse/CALCITE-7554
Project: Calcite
Issue Type: Bug
Components: core
Affects Versions: 1.41.0
Reporter: Ruiqi Dong
*Summary*
NlsString exposes a natural ordering that can silently merge distinct values in
sorted collections. equals() and hashCode() include stringValue, bytesValue,
charsetName, and collation. compareTo() compares only the decoded string value,
optionally through the collator. As a result, two NlsString objects with
identical text but different charset or collation metadata compare as equal in
the natural ordering even though they are not equal as objects.
*Affected code*
File: core/src/main/java/org/apache/calcite/util/NlsString.java
{code:java}
@Override public int hashCode() {
return Objects.hash(stringValue, bytesValue, charsetName, collation);
}
@Override public boolean equals(@Nullable Object obj) {
return this == obj
|| obj instanceof NlsString
&& Objects.equals(stringValue, ((NlsString) obj).stringValue)
&& Objects.equals(bytesValue, ((NlsString) obj).bytesValue)
&& Objects.equals(charsetName, ((NlsString) obj).charsetName)
&& Objects.equals(collation, ((NlsString) obj).collation);
}
@Override public int compareTo(NlsString other) {
if (collation != null && collation.getCollator() != null) {
return collation.getCollator().compare(getValue(), other.getValue());
}
return getValue().compareTo(other.getValue());
} {code}
*Reproducer*
Add the following test to
core/src/test/java/org/apache/calcite/util/MtClawCalciteBugTest.java:
{code:java}
@Test void testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata() {
final NlsString latin1 =
new NlsString("foobar", "LATIN1", SqlCollation.IMPLICIT);
final NlsString utf8 =
new NlsString("foobar", "UTF8", SqlCollation.IMPLICIT);
assertFalse(latin1.equals(utf8));
final TreeSet<NlsString> values = new TreeSet<>();
values.add(latin1);
values.add(utf8);
assertThat(values, hasSize(2));
} {code}
Run:
{code:java}
./gradlew :core:test \
--tests
org.apache.calcite.util.MtClawCalciteBugTest.testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata
{code}
Observed behavior:
The second value is dropped by `TreeSet`
{code:java}
Expected: a collection with size <2>
but: collection size was <1> {code}
Expected behavior:
If charset or collation metadata makes two NlsString instances unequal, the
natural ordering should not treat them as the same sorted-set or sorted-map key.
This is not just a contract mismatch. NlsString's natural ordering loses
metadata that equals() deliberately preserves, so sorted collections keyed by
NlsString can silently collapse distinct literals.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)