Kazuyuki Tanimura created SPARK-53684:
-----------------------------------------
Summary: Spark is not simplifying some expressions for Iceberg
Key: SPARK-53684
URL: https://issues.apache.org/jira/browse/SPARK-53684
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.5.7, 4.1.0, 4.0.2
Reporter: Kazuyuki Tanimura
According to Spark UI, the following excerpt of the physical plan is shown:
(5) Filter [codegen id : 1]
Input [16]: [a#241, b#243L, c#248, d#250, e#251, f#252, g#253L, h#258, i#259,
j#272, k#277, l#286, m#326, n#388, o#394, p#404]
Condition : ((((NOT b#243L IN (0,-1) AND CASE WHEN isnull(i#259) THEN false
WHEN (i#259 = 0) THEN false WHEN (i#259 = 1) THEN false WHEN (i#259 = 2) THEN
true ELSE false END) AND (isnull(o#394) OR ((NOT Contains(o#394, ML_) AND NOT
Contains(o#394, TDD_)) AND NOT Contains(o#394, POLICY_)))) AND (isnull(p#404)
OR (((NOT Contains(p#404, ML_) AND NOT Contains(p#404, TDD_)) AND NOT
Contains(p#404, POLICY_)) AND NOT Contains(p#404, NEW_USER_3_DAYS)))) AND
(date_format(gettimestamp(date_format(gettimestamp(date_format(cast(n#388 as
timestamp), yyyy-MM-dd, Some(UTC)), yyyy-MM-dd, TimestampType, Some(UTC),
false), yyyy-MM-dd, Some(UTC)), yyyy-MM-dd, TimestampType, Some(UTC), false),
yyyy-MM-dd, Some(UTC)) >= 2025-08-26))
The last part of the filter should be able to be reduced further:
(date_format(gettimestamp(date_format(gettimestamp(date_format(cast(n#388 as
timestamp), yyyy-MM-dd, Some(UTC)), yyyy-MM-dd, TimestampType, Some(UTC),
false), yyyy-MM-dd, Some(UTC)), yyyy-MM-dd, TimestampType, Some(UTC), false),
yyyy-MM-dd, Some(UTC)) >= 2025-08-26
—> date_format(cast(n#388 as timestamp), yyyy-MM-dd, Some(UTC) >= 2025-08-26
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]