[
https://issues.apache.org/jira/browse/SPARK-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672478#comment-16672478
]
Jungtaek Lim edited comment on SPARK-10816 at 11/2/18 2:49 AM:
---------------------------------------------------------------
UPDATE: I just discovered the performance critical issue on my patch (I guess
same issue is occurring on Baidu's patch) and just fixed yesterday.
1. Plenty Of Sessions In Key / Append Mode (rate: 500,000)
* HWX, Linked List version: max around 250,000
* HWX, Latest: max around 235,000
* flatMapGroupsWithState (with my state func. implementation): max around
250,000
They're showing CPU being maxed out. (Ran with local[3], and 3 cores of CPU are
all 100% user.)
When I increase rate to 1,000,000 I observed max rate would go up to around
350,000 for linked list version.
2. Plenty Of Keys / Append Mode (rate: 5,000,000)
* HWX, Linked List version: max around 3,700,000
* HWX, Latest: max around 4,000,000
* flatMapGroupsWithState (with my state func. implementation): max around
2,100,000
3. Plenty Of Rows In Session / Append Mode (rate: 50000)
* HWX, Linked List version: max around 27,000
* HWX, Latest: max around 36,000
* flatMapGroupsWithState (with my state func. implementation): max around 26,000
Please note that Linked list version doesn't materialize all sessions (even in
given key) into memory, so it gets rid of concern on loading sessions in
memory. It also shows close or even better performance than manual
implementation of flatMapGroupsWithState. Linked list version sometimes faster
than latest version (load all sessions in group memory) and sometimes slower.
I'll update my patch against linked list version. I guess there's no
performance issue now, now waiting for committers' review.
was (Author: kabhwan):
UPDATE: I just discovered the performance critical issue on my patch (I guess
same issue is occurring on Baidu's patch) and just fixed yesterday.
1. Plenty Of Sessions In Key / Append Mode (rate: 500,000)
* HWX, Linked List version: max around 250,000
* HWX, Latest: max around 235,000
* flatMapGroupsWithState (with my state func. implementation): max around
250,000
They're showing CPU being maxed out. (Ran with local[3], and 3 cores of CPU are
all 100% user.)
When I increase rate to 1,000,000 I observed max rate would go up to around
350,000 for linked list version.
2. Plenty Of Keys / Append Mode (rate: 5,000,000)
* HWX, Linked List version: max around 3,700,000
* HWX, Latest: max around 4,000,000
* flatMapGroupsWithState (with my state func. implementation): max around
2,100,000
3. Plenty Of Rows In Session / Append Mode (rate: 50000)
* HWX, Linked List version: max around 27,000
* HWX, Latest: max around 36,000
* flatMapGroupsWithState (with my state func. implementation): max around 26,000
Please note that Linked list version doesn't materialize all sessions (even in
given key) into memory, so it shows close or even better performance than
manual implementation of flatMapGroupsWithState, as well as get rid of concern
on loading sessions in memory. Linked list version sometimes faster than latest
version (load all sessions in group memory) and sometimes slower.
I'll update my patch against linked list version. I guess there's no
performance issue now, now waiting for committers' review.
> EventTime based sessionization
> ------------------------------
>
> Key: SPARK-10816
> URL: https://issues.apache.org/jira/browse/SPARK-10816
> Project: Spark
> Issue Type: New Feature
> Components: Structured Streaming
> Reporter: Reynold Xin
> Priority: Major
> Attachments: SPARK-10816 Support session window natively.pdf, Session
> Window Support For Structure Streaming.pdf
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]