ZitingShen opened a new pull request #1530:
URL: https://github.com/apache/samza/pull/1530


   Symptom: Some samza job will crash due to YARN PMEM error despite the 
physical-memory-mb is much lower than the container size. 
   <img width="761" alt="Screen Shot 2021-09-09 at 3 40 44 PM" 
src="https://user-images.githubusercontent.com/9065044/132929912-9f8495e1-5c6e-4c1c-909a-fafba08cd5d5.png";>
   
   
   Cause: Current physical-memory-mb metric only calculates the RSS memory of 
the java process that runs the application but ignores all its child processes, 
including those that load tensorflow models and take a lot of memory.
   <img width="1105" alt="Screen Shot 2021-09-09 at 3 39 37 PM" 
src="https://user-images.githubusercontent.com/9065044/132929844-a171bd16-0e9c-4250-8f3f-9a5958933f78.png";>
   
   Changes: Get all the child processes of the java process that runs the 
application, and sum their RSS memory with the RSS memory of the java process 
as the physical-memory-mb of the container.
   
   Test: unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to