Ruiqi Dong created CALCITE-7554:
-----------------------------------

             Summary: NlsString.compareTo() makes TreeSet/TreeMap collapse 
distinct values with different charset or collation metadata
                 Key: CALCITE-7554
                 URL: https://issues.apache.org/jira/browse/CALCITE-7554
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.41.0
            Reporter: Ruiqi Dong


*Summary*
NlsString exposes a natural ordering that can silently merge distinct values in 
sorted collections. equals() and hashCode() include stringValue, bytesValue, 
charsetName, and collation. compareTo() compares only the decoded string value, 
optionally through the collator. As a result, two NlsString objects with 
identical text but different charset or collation metadata compare as equal in 
the natural ordering even though they are not equal as objects.
 
*Affected code*
File: core/src/main/java/org/apache/calcite/util/NlsString.java
{code:java}
@Override public int hashCode() {
  return Objects.hash(stringValue, bytesValue, charsetName, collation);
}

@Override public boolean equals(@Nullable Object obj) {
  return this == obj
      || obj instanceof NlsString
      && Objects.equals(stringValue, ((NlsString) obj).stringValue)
      && Objects.equals(bytesValue, ((NlsString) obj).bytesValue)
      && Objects.equals(charsetName, ((NlsString) obj).charsetName)
      && Objects.equals(collation, ((NlsString) obj).collation);
}

@Override public int compareTo(NlsString other) {
  if (collation != null && collation.getCollator() != null) {
    return collation.getCollator().compare(getValue(), other.getValue());
  }
  return getValue().compareTo(other.getValue());
} {code}
 
*Reproducer* 
Add the following test to 
core/src/test/java/org/apache/calcite/util/MtClawCalciteBugTest.java:
{code:java}
@Test void testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata() {
  final NlsString latin1 =
      new NlsString("foobar", "LATIN1", SqlCollation.IMPLICIT);
  final NlsString utf8 =
      new NlsString("foobar", "UTF8", SqlCollation.IMPLICIT);

  assertFalse(latin1.equals(utf8));

  final TreeSet<NlsString> values = new TreeSet<>();
  values.add(latin1);
  values.add(utf8);

  assertThat(values, hasSize(2));
} {code}
Run:
{code:java}
./gradlew :core:test \
  --tests 
org.apache.calcite.util.MtClawCalciteBugTest.testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata
 {code}
Observed behavior:
The second value is dropped by `TreeSet`
{code:java}
Expected: a collection with size <2>
     but: collection size was <1> {code}
Expected behavior:
If charset or collation metadata makes two NlsString instances unequal, the 
natural ordering should not treat them as the same sorted-set or sorted-map key.
 
This is not just a contract mismatch. NlsString's natural ordering loses 
metadata that equals() deliberately preserves, so sorted collections keyed by 
NlsString can silently collapse distinct literals.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to