[ 
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646177#comment-16646177
 ] 

Li Yuanjian commented on SPARK-10816:
-------------------------------------

Thanks [~zsxwing] for your comment and discussion, great thanks [~kabhwan] for 
the comparing work and sorry for the inactivity, the team back to work in this 
week, we are also comparing our approach and try HWX's patch internal, hope we 
could solve this problem together.
{quote}If I read the codes correctly, 
[https://github.com/apache/spark/pull/22583] is [1]. 
[https://github.com/apache/spark/pull/22482] is a combination of [2] and [3] 
but still need to load all values of a key into the memory at the same time.
{quote}
Yes, our approach is [1], will do the comparing in the design doc. We firstly 
choose this approach mainly consider performance and simple code too.
{quote}Since Baidu’s patch supports Complete mode and Append mode, I ran 
benchmark with Append mode while comparing HWX’s patch with Baidu’s patch.

While Baidu’s patch cannot keep up input rate 200 (it showed max processed rows 
per second as around 130), HWX’s patch (APPEND mode) can keep the input rate 
around 23000.

(Initial input rate was 1000 but Baidu’s patch got very slowed with 
consistently showing "Reached spill threshold of 4096 rows, switching to 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter".)
{quote}
As 
[discussion|https://docs.google.com/document/d/1hdh6GNLzprzlSJDDa4UKMyNQ9-u_H-CDpEtvtggxCL0/edit?disco=AAAACOOWOas],
 the bug of useless shuffle has been fixed. We'll run the benchmark and share 
some result to your doc.

> EventTime based sessionization
> ------------------------------
>
>                 Key: SPARK-10816
>                 URL: https://issues.apache.org/jira/browse/SPARK-10816
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: SPARK-10816 Support session window natively.pdf, Session 
> Window Support For Structure Streaming.pdf
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to