ThomasDelteil commented on issue #11243: weird gpu memory usage
URL: 
https://github.com/apache/incubator-mxnet/issues/11243#issuecomment-397046550
 
 
   @dwSun MXNet is built using asynchronous operations. When you load data, run 
a forward pass or backward pass, the operations are enqueued on the MXNet 
backend and executed when the parent dependencies are available. 
   
   With your current script, there is no blocking operations, so the training 
runs through your epoch and keep adding "copy to GPU" operations. These 
operations don't have parent dependencies and can be executed immediately.  the 
actual training isn't completed when you reach the end of your epoch loop. 
After a few epoch you will clog up your GPU memory.
   
   `print(total_train_loss.asscalar()/training_samples)`
   
   `.asscalar()` is `.asnumpy()[0]`, this causes a synchronous operation to 
copy the memory back to the CPU. When you add this line, the training isn't 
"slow" it is the normal speed, because every 500 iterations, your network is 
going to wait for the computation to be completed and return the result to the 
CPU. If your dataset is small and fit in GPU memory, you can have a 
`mx.nd.waitall` at the end of your epoch, but that means your entire dataset 
will be copied to GPU. This makes it pretty fast since at the beginning of each 
batch the data is already available. However you might run into OOM error, in 
that case you can for example keep track of your loss using `loss_acc += 
loss.sum().asscalar()` and forcing a copy to CPU on every batch.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to