Abacn commented on issue #23904:
URL: https://github.com/apache/beam/issues/23904#issuecomment-2013060164

   Bump to using 50 workers the test passed. It tokes 2 h to run. Throughput is 
like this:
   
   input/output PCollection of GBK:
   
   
![image](https://github.com/apache/beam/assets/8010435/1b0887c7-0884-4b7c-807b-b36210523782)
   
   ------
   
   However, using 5 worker the test isn't a matter of not finish in time, the 
pipeline just stucks after some time:
   
   input/output PCollection of GBK:
   
   
![image](https://github.com/apache/beam/assets/8010435/c526fdbd-d93f-41c2-986b-430b2ea8a909)
   
   and there were worker crash happened throughout the pipeline ran:
   
   number of worker:
   
   
![image](https://github.com/apache/beam/assets/8010435/de7a1e9c-ff7a-4f90-81b8-40db8e4efdc2)
   
   memory usage:
   
   
![image](https://github.com/apache/beam/assets/8010435/88269851-b09c-408b-bb41-f9adddd8f657)
   
   In summary, what happens is
   
   - If the num of worker is not large, each worker appear to accumulate more 
works, and causing OOM eventually, and the pipeline get stuck (persumably 
repeat retry - oom - retry - oom)
   
   - If there are sufficient number of workers, no worker crash, the data can 
be processed in time, though slowly
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to