[ https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646177#comment-16646177 ]
Li Yuanjian commented on SPARK-10816: ------------------------------------- Thanks [~zsxwing] for your comment and discussion, great thanks [~kabhwan] for the comparing work and sorry for the inactivity, the team back to work in this week, we are also comparing our approach and try HWX's patch internal, hope we could solve this problem together. {quote}If I read the codes correctly, [https://github.com/apache/spark/pull/22583] is [1]. [https://github.com/apache/spark/pull/22482] is a combination of [2] and [3] but still need to load all values of a key into the memory at the same time. {quote} Yes, our approach is [1], will do the comparing in the design doc. We firstly choose this approach mainly consider performance and simple code too. {quote}Since Baidu’s patch supports Complete mode and Append mode, I ran benchmark with Append mode while comparing HWX’s patch with Baidu’s patch. While Baidu’s patch cannot keep up input rate 200 (it showed max processed rows per second as around 130), HWX’s patch (APPEND mode) can keep the input rate around 23000. (Initial input rate was 1000 but Baidu’s patch got very slowed with consistently showing "Reached spill threshold of 4096 rows, switching to org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter".) {quote} As [discussion|https://docs.google.com/document/d/1hdh6GNLzprzlSJDDa4UKMyNQ9-u_H-CDpEtvtggxCL0/edit?disco=AAAACOOWOas], the bug of useless shuffle has been fixed. We'll run the benchmark and share some result to your doc. > EventTime based sessionization > ------------------------------ > > Key: SPARK-10816 > URL: https://issues.apache.org/jira/browse/SPARK-10816 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Reporter: Reynold Xin > Priority: Major > Attachments: SPARK-10816 Support session window natively.pdf, Session > Window Support For Structure Streaming.pdf > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org