[ https://issues.apache.org/jira/browse/SPARK-29314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Burak Yavuz reassigned SPARK-29314: ----------------------------------- Assignee: Jungtaek Lim > ProgressReporter.extractStateOperatorMetrics should not overwrite updated as > 0 when it actually runs a batch even with no data > ------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-29314 > URL: https://issues.apache.org/jira/browse/SPARK-29314 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.4.4, 3.0.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > > SPARK-24156 brought the ability to run a batch without actual data to enable > fast state cleanup as well as emit evicted outputs without waiting actual > data to come. > This breaks some assumption on > `ProgressReporter.extractStateOperatorMetrics`. See comment in source code: > {code:java} > // lastExecution could belong to one of the previous triggers if > `!hasNewData`. > // Walking the plan again should be inexpensive. > {code} > and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense > if we copy progress from previous execution (which means no batch is run for > this time), but after SPARK-24156 the precondition is broken. > Spark should still replace the value of newNumRowsUpdated with 0 if there's > no batch being run and it needs to copy the old value from previous > execution, but it shouldn't touch the value if it runs a batch for no data. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org