[
https://issues.apache.org/jira/browse/CALCITE-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruiqi Dong updated CALCITE-7554:
--------------------------------
Description:
*Summary*
NlsString exposes a natural ordering that can silently merge distinct values in
sorted collections. equals() and hashCode() include stringValue, bytesValue,
charsetName, and collation. compareTo() compares only the decoded string value,
optionally through the collator. As a result, two NlsString objects with
identical text but different charset or collation metadata compare as equal in
the natural ordering even though they are not equal as objects.
*Affected code*
File: core/src/main/java/org/apache/calcite/util/NlsString.java
{code:java}
@Override public int hashCode() {
return Objects.hash(stringValue, bytesValue, charsetName, collation);
}
@Override public boolean equals(@Nullable Object obj) {
return this == obj
|| obj instanceof NlsString
&& Objects.equals(stringValue, ((NlsString) obj).stringValue)
&& Objects.equals(bytesValue, ((NlsString) obj).bytesValue)
&& Objects.equals(charsetName, ((NlsString) obj).charsetName)
&& Objects.equals(collation, ((NlsString) obj).collation);
}
@Override public int compareTo(NlsString other) {
if (collation != null && collation.getCollator() != null) {
return collation.getCollator().compare(getValue(), other.getValue());
}
return getValue().compareTo(other.getValue());
} {code}
*Reproducer*
Add the following test to
core/src/test/java/org/apache/calcite/rex/RexBuilderTest.java:
{code:java}
@Test void testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata() {
final NlsString latin1 =
new NlsString("foobar", "LATIN1", SqlCollation.IMPLICIT);
final NlsString utf8 =
new NlsString("foobar", "UTF8", SqlCollation.IMPLICIT);
assertFalse(latin1.equals(utf8));
final TreeSet<NlsString> values = new TreeSet<>();
values.add(latin1);
values.add(utf8);
assertThat(values, hasSize(2));
} {code}
Run:
{code:java}
./gradlew :core:test \
--tests
org.apache.calcite.rex.RexBuilderTest.testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata{code}
Observed behavior:
The second value is dropped by TreeSet
{code:java}
Expected: a collection with size <2>
but: collection size was <1> {code}
Expected behavior:
If charset or collation metadata makes two NlsString instances unequal, the
natural ordering should not treat them as the same sorted-set or sorted-map key.
This is not just a contract mismatch. NlsString's natural ordering loses
metadata that equals() deliberately preserves, so sorted collections keyed by
NlsString can silently collapse distinct literals.
was:
*Summary*
NlsString exposes a natural ordering that can silently merge distinct values in
sorted collections. equals() and hashCode() include stringValue, bytesValue,
charsetName, and collation. compareTo() compares only the decoded string value,
optionally through the collator. As a result, two NlsString objects with
identical text but different charset or collation metadata compare as equal in
the natural ordering even though they are not equal as objects.
*Affected code*
File: core/src/main/java/org/apache/calcite/util/NlsString.java
{code:java}
@Override public int hashCode() {
return Objects.hash(stringValue, bytesValue, charsetName, collation);
}
@Override public boolean equals(@Nullable Object obj) {
return this == obj
|| obj instanceof NlsString
&& Objects.equals(stringValue, ((NlsString) obj).stringValue)
&& Objects.equals(bytesValue, ((NlsString) obj).bytesValue)
&& Objects.equals(charsetName, ((NlsString) obj).charsetName)
&& Objects.equals(collation, ((NlsString) obj).collation);
}
@Override public int compareTo(NlsString other) {
if (collation != null && collation.getCollator() != null) {
return collation.getCollator().compare(getValue(), other.getValue());
}
return getValue().compareTo(other.getValue());
} {code}
*Reproducer*
Add the following test to
core/src/test/java/org/apache/calcite/util/MtClawCalciteBugTest.java:
{code:java}
@Test void testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata() {
final NlsString latin1 =
new NlsString("foobar", "LATIN1", SqlCollation.IMPLICIT);
final NlsString utf8 =
new NlsString("foobar", "UTF8", SqlCollation.IMPLICIT);
assertFalse(latin1.equals(utf8));
final TreeSet<NlsString> values = new TreeSet<>();
values.add(latin1);
values.add(utf8);
assertThat(values, hasSize(2));
} {code}
Run:
{code:java}
./gradlew :core:test \
--tests
org.apache.calcite.util.MtClawCalciteBugTest.testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata
{code}
Observed behavior:
The second value is dropped by `TreeSet`
{code:java}
Expected: a collection with size <2>
but: collection size was <1> {code}
Expected behavior:
If charset or collation metadata makes two NlsString instances unequal, the
natural ordering should not treat them as the same sorted-set or sorted-map key.
This is not just a contract mismatch. NlsString's natural ordering loses
metadata that equals() deliberately preserves, so sorted collections keyed by
NlsString can silently collapse distinct literals.
> NlsString.compareTo() makes TreeSet/TreeMap collapse distinct values with
> different charset or collation metadata
> -----------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-7554
> URL: https://issues.apache.org/jira/browse/CALCITE-7554
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.41.0
> Reporter: Ruiqi Dong
> Priority: Major
>
> *Summary*
> NlsString exposes a natural ordering that can silently merge distinct values
> in sorted collections. equals() and hashCode() include stringValue,
> bytesValue, charsetName, and collation. compareTo() compares only the decoded
> string value, optionally through the collator. As a result, two NlsString
> objects with identical text but different charset or collation metadata
> compare as equal in the natural ordering even though they are not equal as
> objects.
>
> *Affected code*
> File: core/src/main/java/org/apache/calcite/util/NlsString.java
> {code:java}
> @Override public int hashCode() {
> return Objects.hash(stringValue, bytesValue, charsetName, collation);
> }
> @Override public boolean equals(@Nullable Object obj) {
> return this == obj
> || obj instanceof NlsString
> && Objects.equals(stringValue, ((NlsString) obj).stringValue)
> && Objects.equals(bytesValue, ((NlsString) obj).bytesValue)
> && Objects.equals(charsetName, ((NlsString) obj).charsetName)
> && Objects.equals(collation, ((NlsString) obj).collation);
> }
> @Override public int compareTo(NlsString other) {
> if (collation != null && collation.getCollator() != null) {
> return collation.getCollator().compare(getValue(), other.getValue());
> }
> return getValue().compareTo(other.getValue());
> } {code}
>
> *Reproducer*
> Add the following test to
> core/src/test/java/org/apache/calcite/rex/RexBuilderTest.java:
> {code:java}
> @Test void testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata() {
> final NlsString latin1 =
> new NlsString("foobar", "LATIN1", SqlCollation.IMPLICIT);
> final NlsString utf8 =
> new NlsString("foobar", "UTF8", SqlCollation.IMPLICIT);
> assertFalse(latin1.equals(utf8));
> final TreeSet<NlsString> values = new TreeSet<>();
> values.add(latin1);
> values.add(utf8);
> assertThat(values, hasSize(2));
> } {code}
> Run:
> {code:java}
> ./gradlew :core:test \
> --tests
> org.apache.calcite.rex.RexBuilderTest.testNlsStringNaturalOrderingKeepsDistinctCharsetMetadata{code}
> Observed behavior:
> The second value is dropped by TreeSet
> {code:java}
> Expected: a collection with size <2>
> but: collection size was <1> {code}
> Expected behavior:
> If charset or collation metadata makes two NlsString instances unequal, the
> natural ordering should not treat them as the same sorted-set or sorted-map
> key.
>
> This is not just a contract mismatch. NlsString's natural ordering loses
> metadata that equals() deliberately preserves, so sorted collections keyed by
> NlsString can silently collapse distinct literals.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)