[ https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-30657: --------------------------------- Fix Version/s: 3.0.0 > Streaming limit after streaming dropDuplicates can throw error > -------------------------------------------------------------- > > Key: SPARK-30657 > URL: https://issues.apache.org/jira/browse/SPARK-30657 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, > 2.4.3, 2.4.4 > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Critical > Fix For: 3.0.0 > > > {{LocalLimitExec}} does not consume the iterator of the child plan. So if > there is a limit after a stateful operator like streaming dedup in append > mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of > streaming duplicate may not be committed (most stateful ops commit state > changes only after the generated iterator is fully consumed). This leads to > the next batch failing with {{java.lang.IllegalStateException: Error reading > delta file .../N.delta does not exist}} as the state store delta file was > never generated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org