echauchot commented on issue #11055: [BEAM-9436] Improve GBK in spark 
structured streaming runner
URL: https://github.com/apache/beam/pull/11055#issuecomment-605020089
 
 
   > Yes, I agree that materialisation and out of memory should be addressed in 
different Jira/PR
   > Could you post the Nexmark results before and after your fix to compare? 
Thanks
   
   Sure, here are nexmark results. They are not very relevant because the 
results are quite the same before and after the change. Note that 0.2s 
difference is not a real difference because I usually get 0.2s difference 
between 2 consequent runs on the same code base.
   This is because these nexmark results are not very relevant that I ran the 
load tests above. Indeed, nexmark does a lot more than just a GBK, so I used 
GroupByKeyLoadTest as a pure GBK test. 
   
   after the change:
     Conf  Runtime(sec)    (Baseline)  Events(/sec)    (Baseline)       Results 
   (Baseline)
     0000           1,6                     61349,7                      100000 
             
     0001           0,9                    107758,6                       92000 
             
     0002           0,5                    201612,9                         351 
             
     0003  *** not run ***
     0004           1,8                     11415,5                          40 
             
     0005           1,6                     60864,3                          12 
             
     0006           0,8                     25641,0                         103 
             
     0007           2,2                     91116,2                           1 
             
     0008           0,9                    219298,2                        6000 
             
     0009           0,6                     32894,7                         298 
             
     0010           1,1                     88028,2                           1 
             
     0011           0,9                    110375,3                        1919 
             
     0012           0,6                    160771,7                        1919 
             
     0013           0,6                    180505,4                       92000 
             
     0014           1,0                     99304,9                       92000 
             
   ============================================================================
   before the change:
   0000           1,8                     56243,0                      100000   
           
     0001           0,8                    120481,9                       92000 
             
     0002           0,4                    223713,6                         351 
             
     0003  *** not run ***
     0004           1,6                     12232,4                          40 
             
     0005           1,4                     71275,8                          12 
             
     0006           0,9                     23148,1                         103 
             
     0007           2,1                     96711,8                           1 
             
     0008           1,1                    183486,2                        6000 
             
     0009           0,6                     34843,2                         298 
             
     0010           1,2                     85543,2                           1 
             
     0011           0,9                    114547,5                        1919 
             
     0012           0,6                    156006,2                        1919 
             
     0013           0,5                    203666,0                       92000 
             
     0014           1,0                    102145,0                       92000 
             
   ===========================================================================

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to