[ 
https://issues.apache.org/jira/browse/CALCITE-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18088453#comment-18088453
 ] 

Ruiqi Dong commented on CALCITE-7553:
-------------------------------------

Following the PR feedback:
 
The issue seems to be that Calcite currently uses Java Comparable in places 
where two different semantics are possible:
 
1. Java/natural/syntactic ordering
   - Used by equals/hashCode consistency and Java collections.
   - For TimestampWithTimeZoneString, this should distinguish values with 
different canonical strings, even if they represent the same instant.
 
2. SQL value comparison
   - Used by SQL evaluation, RexSimplify, RexInterpreter, constant folding, and 
eventually Sarg.
   - For TIMESTAMP_TZ, this should compare instants.
 
There seem to be two possible designs:
 
Option A: add an interface, for example ComparableSqlValue
- Calcite-owned literal value classes such as TimestampWithTimeZoneString could 
implement it.
- The interface could expose a SQL-value comparison method.
- This makes the special semantics explicit on the value type.
 
Option B: add a central SQL value Comparator
- Rex/RexBuilder/Sarg-related code would call a shared comparator instead of 
natural Comparable where SQL semantics are required.
- This may be less intrusive initially, because many literal values are 
existing Java/JDK types and cannot all implement a Calcite interface.
- Calcite-owned temporal classes could still implement an interface later if 
that is preferred.
 
My current preference is Option B as the first step, because it separates SQL 
comparison from Java natural ordering without requiring wrappers or interface 
implementations for every literal value type. But I am happy to follow the 
direction preferred by the project.
 
Once we agree on the design, I can update the PR accordingly.

> TimestampWithTimeZoneString.compareTo() makes TreeSet/TreeMap drop distinct 
> timestamp-with-zone values
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-7553
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7553
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.41.0
>            Reporter: Ruiqi Dong
>            Priority: Minor
>              Labels: pull-request-available
>
> *Summary*
> TimestampWithTimeZoneString exposes a natural ordering that can silently drop 
> distinct values from TreeSet and TreeMap. equals() and hashCode() use the 
> canonical string form, including the local timestamp text and the zone ID. 
> compareTo(), however, compares only the parsed Calendar instant. As a result, 
> two distinct values such as: 
>  * 1969-07-21 02:56:15 GMT-08:00
>  * 1969-07-21 10:56:15 GMT
> are not equal as objects, but compare as equal in the natural ordering. Any 
> sorted collection keyed on this class can therefore silently discard one of 
> them.
>  
> *Affected code*
> File: 
> core/src/main/java/org/apache/calcite/util/TimestampWithTimeZoneString.java
> {code:java}
> @Override public boolean equals(@Nullable Object o) {
>   return o == this
>       || o instanceof TimestampWithTimeZoneString
>       && ((TimestampWithTimeZoneString) o).v.equals(v);
> }
> @Override public int hashCode() {
>   return v.hashCode();
> }
> @Override public int compareTo(TimestampWithTimeZoneString o) {
>   return this.pt.getCalendar().compareTo(o.pt.getCalendar());
> } {code}
> Reproducer
> Add the following test to: 
> core/src/test/java/org/apache/calcite/rex/RexBuilderTest.java
> {code:java}
> @Test void 
> testTimestampWithTimeZoneStringNaturalOrderingKeepsDistinctValues() {
>   final TimestampWithTimeZoneString pst =
>       new TimestampWithTimeZoneString("1969-07-21 02:56:15 GMT-08:00");
>   final TimestampWithTimeZoneString gmt =
>       new TimestampWithTimeZoneString("1969-07-21 10:56:15 GMT");
>   assertFalse(pst.equals(gmt));
>   final TreeSet<TimestampWithTimeZoneString> values = new TreeSet<>();
>   values.add(pst);
>   values.add(gmt);
>   assertThat(values, hasSize(2));
> } {code}
> Run:
> {code:java}
> ./gradlew :core:test \
>   --tests 
> org.apache.calcite.rex.RexBuilderTest.testTimestampWithTimeZoneStringNaturalOrderingKeepsDistinctValues{code}
> Observed behavior:
> The test fails because the second distinct value is dropped by the natural 
> ordering
> {code:java}
> Expected: a collection with size <2>
>      but: collection size was <1> {code}
> Expected behavior:
> If two TimestampWithTimeZoneString instances are not equal as values, the 
> natural ordering should not collapse them into a single sorted-set or 
> sorted-map key.
>  
> Even if ordering by instant is intentional, exposing it as the class's 
> natural ordering while equals() preserves zone-qualified text is unsafe. In 
> practice, sorted collections keyed on this class deduplicate distinct 
> timestamp-with-zone literals that happen to denote the same instant.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to