[
https://issues.apache.org/jira/browse/BEAM-9436?focusedWorklogId=411129&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-411129
]
ASF GitHub Bot logged work on BEAM-9436:
----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Mar/20 14:54
Start Date: 27/Mar/20 14:54
Worklog Time Spent: 10m
Work Description: echauchot commented on issue #11055: [BEAM-9436]
Improve GBK in spark structured streaming runner
URL: https://github.com/apache/beam/pull/11055#issuecomment-605020089
> Yes, I agree that materialisation and out of memory should be addressed in
different Jira/PR
> Could you post the Nexmark results before and after your fix to compare?
Thanks
Sure, here are nexmark results. They are not very relevant because the
results are quite the same before and after the change. Note that 0.2s
difference is not a real difference because I usually get 0.2s difference
between 2 consequent runs on the same code base.
This is because these nexmark results are not very relevant that I ran the
load tests above. Indeed, nexmark does a lot more than just a GBK, so I used
GroupByKeyLoadTest as a pure GBK test.
after the change:
```
Conf Runtime(sec) (Baseline) Events(/sec) (Baseline) Results
(Baseline)
0000 1,6 61349,7 100000
0001 0,9 107758,6 92000
0002 0,5 201612,9 351
0003 *** not run ***
0004 1,8 11415,5 40
0005 1,6 60864,3 12
0006 0,8 25641,0 103
0007 2,2 91116,2 1
0008 0,9 219298,2 6000
0009 0,6 32894,7 298
0010 1,1 88028,2 1
0011 0,9 110375,3 1919
0012 0,6 160771,7 1919
0013 0,6 180505,4 92000
0014 1,0 99304,9 92000
============================================================================
```
before the change:
```
Conf Runtime(sec) (Baseline) Events(/sec) (Baseline) Results
(Baseline)
0000 1,8 56243,0 100000
0001 0,8 120481,9 92000
0002 0,4 223713,6 351
0003 *** not run ***
0004 1,6 12232,4 40
0005 1,4 71275,8 12
0006 0,9 23148,1 103
0007 2,1 96711,8 1
0008 1,1 183486,2 6000
0009 0,6 34843,2 298
0010 1,2 85543,2 1
0011 0,9 114547,5 1919
0012 0,6 156006,2 1919
0013 0,5 203666,0 92000
0014 1,0 102145,0 92000
===========================================================================
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 411129)
Time Spent: 14h (was: 13h 50m)
> Improve performance of GBK
> --------------------------
>
> Key: BEAM-9436
> URL: https://issues.apache.org/jira/browse/BEAM-9436
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark
> Reporter: Etienne Chauchot
> Assignee: Etienne Chauchot
> Priority: Major
> Labels: structured-streaming
> Time Spent: 14h
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)