discord9 commented on code in PR #22837:
URL: https://github.com/apache/datafusion/pull/22837#discussion_r3394090769


##########
datafusion/optimizer/tests/optimizer_integration.rs:
##########
@@ -795,7 +795,7 @@ fn extension_node_does_not_block_projection_pruning() -> 
Result<()> {
         Projection: t.a, CAST(t.ts AS Timestamp(ms, "UTC")) AS ts
           Filter: __common_expr_3 > TimestampMillisecond(1000, Some("UTC")) 
AND __common_expr_3 < TimestampMillisecond(2000, Some("UTC"))
             Projection: CAST(t.ts AS Timestamp(ms, "UTC")) AS __common_expr_3, 
t.a, t.ts
-              TableScan: t projection=[a, ts], partial_filters=[t.ts > 
TimestampNanosecond(1000000000, None), t.ts < TimestampNanosecond(2000000000, 
None), CAST(t.ts AS Timestamp(ms, "UTC")) > TimestampMillisecond(1000, 
Some("UTC")), CAST(t.ts AS Timestamp(ms, "UTC")) < TimestampMillisecond(2000, 
Some("UTC"))]
+              TableScan: t projection=[a, ts], partial_filters=[CAST(t.ts AS 
Timestamp(ms, "UTC")) > TimestampMillisecond(1000, Some("UTC")), CAST(t.ts AS 
Timestamp(ms, "UTC")) < TimestampMillisecond(2000, Some("UTC"))]

Review Comment:
   ~actually the 1_000_000_001ns basically only allow `>=` and `<` to unwrap 
cast(FOR POSITIVE timestamp), remove them from block list now~
   | Op | Same-op rewrite safe? | Example value | Why results match / differ |
   |---|---:|---:|---|
   | `=` | ❌ No | `1_000_000_001ns` | Before: value truncates to `1000ms`, so 
`1000 = 1000` is true. After: `1_000_000_001 = 1_000_000_000` is false. |
   | `!=` | ❌ No | `1_000_000_001ns` | Before: both are in the `1000ms` bucket, 
so `1000 != 1000` is false. After: exact ns values differ, so true. |
   | `>` | ❌ No | `1_000_000_001ns` | Before: value still truncates to 
`1000ms`, so `1000 > 1000` is false. After: `1_000_000_001 > 1_000_000_000` is 
true. |
   | `>=` | ✅ Yes for this positive bucket boundary | N/A | It matches the 
lower edge of the bucket: values below `1_000_000_000ns` cast below `1000ms`; 
values at/above it cast to `1000ms` or higher. |
   | `<` | ✅ Yes for this positive bucket boundary | N/A | It is the inverse of 
the lower-bound check: values before the bucket cast below `1000ms`; values 
inside or after the bucket do not. |
   | `<=` | ❌ No | `1_000_000_001ns` | Before: value truncates to `1000ms`, so 
`1000 <= 1000` is true. After: `1_000_000_001 <= 1_000_000_000` is false. |
   | `IS DISTINCT FROM` | ❌ No | `1_000_000_001ns` | Same as `!=` for non-null 
values: before false because both are `1000ms`; after true because exact ns 
values differ. |
   | `IS NOT DISTINCT FROM` | ❌ No | `1_000_000_001ns` | Same as `=` for 
non-null values: before true because both are `1000ms`; after false because 
exact ns values differ. |



##########
datafusion/expr-common/src/casts.rs:
##########
@@ -103,6 +103,35 @@ fn is_lossy_temporal_cast(from_type: &DataType, to_type: 
&DataType) -> bool {
         || (is_date_type(to_type) && from_type.is_temporal())
 }
 
+/// Returns true when casting a timestamp from `from_type` to `to_type` loses

Review Comment:
   > Why is this specific to timestamps? It seems like any narrowing cast would 
have the same problem for example cast(x as int) = 5 can't be unwrapped to `x = 
5.0` for floats
   
   ~was just trying to make this pr small since the last allow list one is too 
large, considering adding a longer block list~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to