[
https://issues.apache.org/jira/browse/SPARK-42376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685315#comment-17685315
]
Jungtaek Lim commented on SPARK-42376:
--------------------------------------
Will submit a PR sooner.
> Introduce watermark propagation among operators
> -----------------------------------------------
>
> Key: SPARK-42376
> URL: https://issues.apache.org/jira/browse/SPARK-42376
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.5.0
> Reporter: Jungtaek Lim
> Priority: Major
>
> With introduction of SPARK-40925, we enabled workloads containing multiple
> stateful operators in a single streaming query.
> The JIRA ticket clearly described out-of-scope, "Here we propose fixing the
> late record filtering in stateful operators to allow chaining of stateful
> operators {*}which do not produce delayed records (like time-interval join or
> potentially flatMapGroupsWithState){*}".
> We identified production use case for stream-stream time-interval join
> followed by stateful operator (e.g. window aggregation), and propose to
> address such use case via this ticket.
> The design will be described in the PR, but the sketched idea is introducing
> simulation of watermark propagation among operators. As of now, Spark
> considers all stateful operators to have same input watermark and output
> watermark, which introduced the limitation. With this ticket, we construct
> the logic to simulate watermark propagation so that each operator can have
> its own (input watermark, output watermark). Operators introducing delayed
> records will produce delayed output watermark, and downstream operator can
> take the delay into account as input watermark will be adjusted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]