Yuming Wang created SPARK-39069:
-----------------------------------
Summary: Simplify another conditionals case in predicate
Key: SPARK-39069
URL: https://issues.apache.org/jira/browse/SPARK-39069
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang
{code:scala}
sql(
"""
|CREATE TABLE t1 (
| id DECIMAL(18,0),
| event_dt DATE,
| cmpgn_run_dt DATE)
|USING parquet
|PARTITIONED BY (cmpgn_run_dt)
""".stripMargin)
sql(
"""
|select count(*)
|from t1
|where CMPGN_RUN_DT >= date_sub(EVENT_DT,2) and CMPGN_RUN_DT <= EVENT_DT
|and EVENT_DT ='2022-04-05'
|;
""".stripMargin).explain(true)
{code}
Excepted:
{noformat}
== Optimized Logical Plan ==
Aggregate [count(1) AS count(1)#4L]
+- Project
+- Filter (((isnotnull(CMPGN_RUN_DT#3) AND (CMPGN_RUN_DT#3 >= 2022-04-03))
AND (CMPGN_RUN_DT#3 <= 2022-04-05)) AND (EVENT_DT#2 = 2022-04-05))
+- Relation default.t1[id#1,event_dt#2,cmpgn_run_dt#3] parquet
== Physical Plan ==
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#4L])
+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#31]
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)],
output=[count#7L])
+- *(1) Project
+- *(1) Filter (EVENT_DT#2 = 2022-04-05)
+- *(1) ColumnarToRow
+- FileScan parquet default.t1[event_dt#2,cmpgn_run_dt#3]
Batched: true, DataFilters: [(event_dt#2 = 2022-04-05)], Format: Parquet,
Location: InMemoryFileIndex[], PartitionFilters: [isnotnull(cmpgn_run_dt#3),
(cmpgn_run_dt#3 >= 2022-04-03), (cmpgn_run_dt#3 <= 2022-04-05)], PushedFilters:
[EqualTo(event_dt,2022-04-05)], ReadSchema: struct<event_dt:date>, UsedIndexes:
[]
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]