eric-haibin-lin opened a new issue #12842: Memory reservation feature
URL: https://github.com/apache/incubator-mxnet/issues/12842
 
 
   MXNet GPU memory consumption changes during the training job. The training 
job easily gets OOM exception when the code is running in a shared environment 
with limited memory (multiple ppl sharing the same GPUs in the research lab). 
Whereas TF code always reserves x GB of GPU memory and is never kicked out once 
the job starts. 
   
   If we can have an API to reserve x GB GPU memory for MXNet, it will be 
great. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to