[
https://issues.apache.org/jira/browse/SPARK-26655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532614#comment-17532614
]
Huw commented on SPARK-26655:
-----------------------------
I hit the guards in UnsupportedOperationChecker recently, and considered that
if I was using append mode it would be sound. Glad to see it's being looked
into.
I think this also applies to flatMapGroupsWithState, and specifically, the
error "flatMapGroupsWithState in append mode is not supported with $outputMode
output mode on a streaming DataFrame/Dataset".
> Support multiple aggregates in Structured Streaming append mode
> ---------------------------------------------------------------
>
> Key: SPARK-26655
> URL: https://issues.apache.org/jira/browse/SPARK-26655
> Project: Spark
> Issue Type: Improvement
> Components: Structured Streaming
> Affects Versions: 3.1.0
> Reporter: Arun Mahadevan
> Priority: Major
> Attachments: Watermarks and multiple aggregates in Spark strucutred
> streaming_v1.pdf
>
>
> Right now multiple aggregates are not supported in structured streaming.
> However, in append mode, the aggregates are emitted only after the watermark
> passes the threshold (e.g. the window boundary) and the emitted value is not
> affected by further late data. So it possible to chain multiple aggregates in
> 'Append' output mode without worrying about retractions.
> However the current event time watermarks in structured streaming are tracked
> at a global level and this does not work when aggregates are chained.
> We need to track the watermarks at individual operator level so that each
> operator can make progress independently and not rely on global min or max
> value.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]