[
https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17025575#comment-17025575
]
Dongjoon Hyun commented on SPARK-30657:
---------------------------------------
Hi, [~tdas]. Can we have `2.4.5` at `Target Version`, too?
> Streaming limit after streaming dropDuplicates can throw error
> --------------------------------------------------------------
>
> Key: SPARK-30657
> URL: https://issues.apache.org/jira/browse/SPARK-30657
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2,
> 2.4.3, 2.4.4
> Reporter: Tathagata Das
> Assignee: Tathagata Das
> Priority: Critical
>
> {{LocalLimitExec}} does not consume the iterator of the child plan. So if
> there is a limit after a stateful operator like streaming dedup in append
> mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of
> streaming duplicate may not be committed (most stateful ops commit state
> changes only after the generated iterator is fully consumed). This leads to
> the next batch failing with {{java.lang.IllegalStateException: Error reading
> delta file .../N.delta does not exist}} as the state store delta file was
> never generated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]