Jungtaek Lim created SPARK-50046:
------------------------------------

             Summary: [Possible bug] Incorrect watermark advancement if 
watermark node is lost/pruned
                 Key: SPARK-50046
                 URL: https://issues.apache.org/jira/browse/SPARK-50046
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


This does not happen in current optimization rules, but it was mostly a luck 
and we were silently dropping CollectMetrics node, hence it'd be ideal to 
address the issue in prior.

WatermarkTracker only looks at the physical plan during calculation of the new 
watermark value. It determines the watermark node by index, hence we have 
various issues when the watermark node is lost on the optimization.

1) watermark advancement is made even there is one node to be dropped (should 
be considered as no data from that node)

2) watermark tracker incorrectly update the memory map of the previous value of 
watermark node (index is not a stable key)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to