[jira] [Commented] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error

Tathagata Das (Jira) Fri, 31 Jan 2020 16:36:37 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027910#comment-17027910
 ]


Tathagata Das commented on SPARK-30657:
---------------------------------------

This fix by itself (separate from the fix for SPARK-30658) may be backported. 
The solution that I did to always inject StreamingLocalLimitExec is safe from 
correctness point of view, but is a little risky from the performance point of 
view (which I tried to minimize using the optimization). With 2.4.4+, unless 
this is a serious bug that affects many users, I dont think we should backport 
this. And i dont think limit on streaming is that extensively used such that 
this is big bug (it has not been reported for 1.5 years). 

What do you think [~zsxwing]

> Streaming limit after streaming dropDuplicates can throw error
> --------------------------------------------------------------
>
>                 Key: SPARK-30657
>                 URL: https://issues.apache.org/jira/browse/SPARK-30657
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 
> 2.4.3, 2.4.4
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Critical
>             Fix For: 3.0.0
>
>
> {{LocalLimitExec}} does not consume the iterator of the child plan. So if 
> there is a limit after a stateful operator like streaming dedup in append 
> mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of 
> streaming duplicate may not be committed (most stateful ops commit state 
> changes only after the generated iterator is fully consumed). This leads to 
> the next batch failing with {{java.lang.IllegalStateException: Error reading 
> delta file .../N.delta does not exist}} as the state store delta file was 
> never generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error

Reply via email to