[
https://issues.apache.org/jira/browse/SPARK-29314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167644#comment-17167644
]
Sandeep Katta commented on SPARK-29314:
---------------------------------------
ya my bad, not required to backport
> ProgressReporter.extractStateOperatorMetrics should not overwrite updated as
> 0 when it actually runs a batch even with no data
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-29314
> URL: https://issues.apache.org/jira/browse/SPARK-29314
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 2.4.4, 3.0.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-24156 brought the ability to run a batch without actual data to enable
> fast state cleanup as well as emit evicted outputs without waiting actual
> data to come.
> This breaks some assumption on
> `ProgressReporter.extractStateOperatorMetrics`. See comment in source code:
> {code:java}
> // lastExecution could belong to one of the previous triggers if
> `!hasNewData`.
> // Walking the plan again should be inexpensive.
> {code}
> and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense
> if we copy progress from previous execution (which means no batch is run for
> this time), but after SPARK-24156 the precondition is broken.
> Spark should still replace the value of newNumRowsUpdated with 0 if there's
> no batch being run and it needs to copy the old value from previous
> execution, but it shouldn't touch the value if it runs a batch for no data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]