[GitHub] [beam] damccorm opened a new issue, #19990: Performance Issues with Beam Runners compared with Native Systems

GitBox Sat, 04 Jun 2022 07:41:46 -0700


damccorm opened a new issue, #19990:
URL: https://github.com/apache/beam/issues/19990


   While doing a performance evaluation of Apache Beam with Spark Runner - I 
found that even for a simple word count problem on a text file – Beam with 
Spark runner was slower by a factor of 5 times as compared to Spark for a 
dataset as small as 14 GB.
   
   You will find more details on this evaluation here - 
[https://github.com/soumabrata-chakraborty/spark-vs-beam/blob/master/README.md](https://github.com/soumabrata-chakraborty/spark-vs-beam/blob/master/README.md)
   
   I also came across this analysis called _**Quantitative Impact Evaluation of 
an Abstraction Layer for Data Stream Processing Systems_ 
([https://arxiv.org/pdf/1907.08302.pdf](https://arxiv.org/pdf/1907.08302.pdf) / 
[https://ieeexplore.ieee.org/document/8884832](https://ieeexplore.ieee.org/document/8884832))
   
   According to it, the observation was that for most scenarios the slowdown 
was at least a factor of 3 with the worse case being a factor of 58!
   
   While it is understood that an abstraction layer would come with some 
performance cost - the current performance cost seems to be very high.
   
   Imported from Jira 
[BEAM-9440](https://issues.apache.org/jira/browse/BEAM-9440). Original Jira may 
contain additional context.
   Reported by: soumabrata.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #19990: Performance Issues with Beam Runners compared with Native Systems

Reply via email to