kevin85421 opened a new pull request #699: URL: https://github.com/apache/submarine/pull/699
### What is this PR for? The following two pull requests aim to resolve the Out-Of-Memory error. However, it is very inconvenient for users to predict the actual memory usage. Thus, using the memory request and memory limit mechanism to allow overcommitment of memory is helpful for users. * https://github.com/apache/submarine/pull/621 * https://github.com/apache/submarine/pull/510 In this PR, I set the memory limit to twice the memory request to enable overcommitment of memory. With this patch, the OOM errors can be reduced effectively. This [article](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits) is a good resource to better understand this PR. ### What type of PR is it? [Feature] ### Todos ### What is the Jira issue? https://issues.apache.org/jira/browse/SUBMARINE-948 ### How should this be tested? **Test1** * Create a distributed TensorFlow MNIST job, and set the memory quota of a worker to 512 MB. To elaborate, modify [experimentIT.java:90](https://github.com/apache/submarine/blob/master/submarine-test/test-e2e/src/test/java/org/apache/submarine/integration/experimentIT.java#L90) to ```java experimentPage.fillTfSpec(2, new String[]{"Ps", "Worker"}, new int[]{1, 1}, new int[]{1, 1}, new int[]{512, 512}); ``` * Without this PR, this MNIST job will be killed due to an Out-Of-Memory error. On the other hand, with this PR, the MNIST job will not be killed. **Test2** ``` kubectl describe ${your_experiment_pod} ``` <img width="422" alt="ζͺε 2021-08-06 δΈε2 40 42" src="https://user-images.githubusercontent.com/20109646/128474314-bcfc0067-a841-4bdb-8ce2-4014849ffd57.png"> ### Screenshots (if appropriate) ### Questions: * Do the license files need updating? No * Are there breaking changes for older versions? No * Does this need new documentation? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
