[
https://issues.apache.org/jira/browse/SPARK-46064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun closed SPARK-46064.
---------------------------------
> EliminateEventTimeWatermark does not consider the fact that isStreaming flag
> can change for current child during resolution
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-46064
> URL: https://issues.apache.org/jira/browse/SPARK-46064
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.3.2, 3.4.1, 3.5.0, 4.0.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.2, 3.5.1, 4.0.0
>
>
> Looks like this is a long standing bug.
> The object `EliminateEventTimeWatermark` is implemented as a rule, but it is
> not registered in analyzer/optimizer. Instead, it is called directly when
> withWatermark method is called, which means the rule is applied immediately
> against the child, regardless whether child is resolved or not.
> It is not an issue for the usage of pure DataFrame API because streaming
> sources have the flag isStreaming set to true even it is yet resolved, but
> mix-up of SQL and DataFrame API would expose the issue; we may not know the
> exact value of isStreaming flag on unresolved node and it is subject to
> change upon resolution.
> We should register EliminateEventTimeWatermark as a rule on analysis (or
> pre-optimization) instead, and do not apply the elimination if the child is
> not yet resolved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]