samskalicky commented on issue #12842: Memory reservation feature
URL: 
https://github.com/apache/incubator-mxnet/issues/12842#issuecomment-430439259
 
 
   Interesting idea @eric-haibin-lin, shouldnt this be something outside of 
mxnet like slurm where something outside of the job (ie. mxnet) in the OS 
contain resource usage for the process? Typically when you submit jobs in a 
shared environment you say upfront how many cores/memory you need for your job 
and then if the process uses more than its allocated the job gets killed. 
   
   Im not sure something like this works for GPUs thats commonly available 
(outside of the way cloud providers structure their hypervisors like EC2 to 
split hardware access -- GPUs -- between various guest OSes). But something 
outside of MXNet at the OS level would be better for containing resources 
between users IMO, but lmk what you think.
   
   I guess the general idea of reserving memory upfront is good for performance 
to avoid individual allocations later on during the run. And this idea should 
apply to GPU (or any other accelerator) memory too. And it might help with the 
multi-user scenario like you mention where you can get the memory you need 
upfront (and fail early) rather than wait till the end of training and run out 
of memory. 
   
   Thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to