Jungtaek Lim created SPARK-34892:
------------------------------------

             Summary: Introduce MergingSortWithSessionWindowStateIterator 
sorting input rows and rows in state efficiently
                 Key: SPARK-34892
                 URL: https://issues.apache.org/jira/browse/SPARK-34892
             Project: Spark
          Issue Type: Sub-task
          Components: Structured Streaming
    Affects Versions: 3.2.0
            Reporter: Jungtaek Lim


This issue tracks effort on introducing 
MergingSortWithSessionWindowStateIterator which will ensure the sort order 
between input rows and rows in state via efficient way. 
MergingSortWithSessionWindowStateIterator will require precondition that input 
rows are sorted, and assume that the number of rows in state per group key will 
be small. As the name represents, the iterator will do merge sort between twos 
and provide elements one by one.

The precondition will be guaranteed via physical node, and the assume is most 
likely true unless watermark gap is specified like hours and there're quite 
lots of old but not late input rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to