eric-haibin-lin opened a new issue #12842: Memory reservation feature URL: https://github.com/apache/incubator-mxnet/issues/12842 MXNet GPU memory consumption changes during the training job. The training job easily gets OOM exception when the code is running in a shared environment with limited memory (multiple ppl sharing the same GPUs in the research lab). Whereas TF code always reserves x GB of GPU memory and is never kicked out once the job starts. If we can have an API to reserve x GB GPU memory for MXNet, it will be great.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
