Jungtaek Lim created SPARK-34892:
------------------------------------
Summary: Introduce MergingSortWithSessionWindowStateIterator
sorting input rows and rows in state efficiently
Key: SPARK-34892
URL: https://issues.apache.org/jira/browse/SPARK-34892
Project: Spark
Issue Type: Sub-task
Components: Structured Streaming
Affects Versions: 3.2.0
Reporter: Jungtaek Lim
This issue tracks effort on introducing
MergingSortWithSessionWindowStateIterator which will ensure the sort order
between input rows and rows in state via efficient way.
MergingSortWithSessionWindowStateIterator will require precondition that input
rows are sorted, and assume that the number of rows in state per group key will
be small. As the name represents, the iterator will do merge sort between twos
and provide elements one by one.
The precondition will be guaranteed via physical node, and the assume is most
likely true unless watermark gap is specified like hours and there're quite
lots of old but not late input rows.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]