L. C. Hsieh created SPARK-46502:
-----------------------------------
Summary: Support timestamp types in UnwrapCastInBinaryComparison
Key: SPARK-46502
URL: https://issues.apache.org/jira/browse/SPARK-46502
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.0.0
Reporter: L. C. Hsieh
We have an optimization rule `UnwrapCastInBinaryComparison` that handles
similar cases but it doesn't cover timestamp types.
For a query plan like
```
== Analyzed Logical Plan ==
batch: timestamp
Project [batch#26466]
+- Filter (batch#26466 >= cast(2023-12-21 10:00:00 as timestamp))
+- SubqueryAlias spark_catalog.default.timestamp_view
+- View (`spark_catalog`.`default`.`timestamp_view`, [batch#26466])
+- Project [cast(batch#26467 as timestamp) AS batch#26466]
+- Project [cast(batch#26463 as timestamp) AS batch#26467]
+- SubqueryAlias spark_catalog.default.table_timestamp
+- Relation
spark_catalog.default.table_timestamp[batch#26463] parquet
== Optimized Logical Plan ==
Project [cast(batch#26463 as timestamp) AS batch#26466]
+- Filter (isnotnull(batch#26463) AND (cast(batch#26463 as timestamp) >=
2023-12-21 10:00:00))
+- Relation spark_catalog.default.table_timestamp[batch#26463] parquet
```
The predicate compares a timestamp_ntz column with a literal value. As the
column is wrapped in a cast expression to timestamp type, the literal (string)
is wrapped with a cast to timestamp type. The literal with cast is foldable so
it is evaluated to literal of timestamp earlier. So the predicate becomes
`cast(batch#26463 as timestamp) >= 2023-12-21 10:00:00`. As the cast is in
column side, it cannot be pushed down to data source/table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]