[ 
https://issues.apache.org/jira/browse/SPARK-57748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-57748.
------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56888
[https://github.com/apache/spark/pull/56888]

> Use a dedicated tree-pattern bit for the TIME -> TIMESTAMP_NTZ cast rewrite 
> in ComputeCurrentTime
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57748
>                 URL: https://issues.apache.org/jira/browse/SPARK-57748
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Assignee: Anupam Yadav
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> h2. Summary
> Replace the broad {{containsPattern(CAST)}} pruning condition in the 
> {{ComputeCurrentTime}}
> optimizer rule with a dedicated tree-pattern bit, so the rule only descends 
> into plans that
> actually contain a cast to the {{TIMESTAMP_NTZ}} family instead of any plan 
> that contains any cast.
> h2. Background
> SPARK-57618 added {{CAST(TIME(p) AS TIMESTAMP_NTZ(q))}}, whose date fields 
> come from
> {{CURRENT_DATE}}. To keep the value query-stable, {{ComputeCurrentTime}} 
> rewrites such casts into a
> date+time builder anchored on the same current-date literal as 
> {{current_date()}}.
> For the rule to visit those casts, its pruning predicate was widened to:
> {code:scala}
> bits.containsPattern(CURRENT_LIKE) || bits.containsPattern(CAST)
> {code}
> Tagging the {{Cast}} with {{CURRENT_LIKE}} was rejected earlier because that 
> pattern has shared
> semantics (e.g. inline-table validation in {{EvaluateUnresolvedInlineTable}} 
> treats {{CURRENT_LIKE}}
> expressions as safe to defer, which would let unrelated non-foldable 
> NTZ-target casts such as
> {{CAST(rand() AS TIMESTAMP_NTZ)}} bypass validation).
> The {{CAST}} fallback is correct but defeats pruning: casts are present in 
> almost every query, so
> {{ComputeCurrentTime}} now traverses the full expression tree of essentially 
> every plan even though
> the {{TIME -> TIMESTAMP_NTZ}} rewrite fires rarely.
> h2. Proposal
> Introduce a dedicated {{TreePattern}} (e.g. {{CAST_TO_TIMESTAMP_NTZ}}) and:
> * Tag it in {{Cast.nodePatternsInternal}} keyed on the *target* type only 
> (the {{TIMESTAMP_NTZ}} /
> {{TIMESTAMP_NTZ(p)}} families), never on {{child.dataType}}.
> * Prune {{ComputeCurrentTime}} on {{CURRENT_LIKE || CAST_TO_TIMESTAMP_NTZ}} 
> instead of {{CAST}}.
> The node-level {{Cast.isTimeToTimestampNTZ}} guard stays, so only {{TIME -> 
> TIMESTAMP_NTZ}} casts are
> actually rewritten.
> h2. Constraints / notes
> * The tag must be keyed on the target type, not the source: 
> {{nodePatternsInternal}} is computed
> eagerly at {{Cast}} construction, before the child is resolved, and reading 
> {{child.dataType}} there
> can throw even when {{child.resolved}} is true (e.g. an {{OuterReference}} 
> wrapping an unresolved
> attribute - the {{makeSQLTableFunctionPlan}} / {{sql-udf.sql}} crash seen 
> during SPARK-57618). The
> target type is always safe to read.
> * This still slightly over-tags (all NTZ-target casts, not strictly {{TIME 
> ->}}), but the bit is
> dedicated with no other consumers, so it cannot leak into inline-table 
> validation, streaming
> {{CURRENT_LIKE}} handling, or {{ReplaceCurrentLike}}. Full source precision 
> is not safely achievable
> at construction time.
> h2. Testing
> * Keep the existing inline-table regression test ({{CAST(rand() AS 
> TIMESTAMP_NTZ)}} still rejected).
> * Add a {{ComputeCurrentTimeSuite}} assertion that a plan whose only casts 
> are unrelated
> (e.g. {{string -> int}}) is left untouched, while {{TIME -> TIMESTAMP_NTZ}} 
> is still rewritten to a
> date literal consistent with {{current_date()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to