[GitHub] [incubator-mxnet] ann-qin-lu commented on issue #20959: GPU memory leak when using gluon.data.DataLoader with num_workers>0

GitBox Mon, 28 Mar 2022 12:21:55 -0700


ann-qin-lu commented on issue #20959:
URL: 
https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1081046122



   Hi @ptrendx, thanks a lot for the explanation! Now I get a much clear 
picture of what's going wrong. If the actually RC is that "CUDA does not in 
fact survive forking", does it mean multiprocessing with `fork` method should 
be avoided from the very beginning?
   
   Just a quick summary for the two approaches we discussed:
   
   * with the workaround that skips the clean up for all engines, it has the 
issue of lingering gpu resources held by engine, whenever the multiprocess fork 
method is used. Proposed solution is to use `spawn` in Gluon.DataLoader. 
@waytrue17 if you can help?
   * if we revert the workaround, we will see the non-deterministic segfault 
issue at exit. This segfault could be resolved if this open Open issue for 
[Better handling of the engine 
destruction](https://github.com/apache/incubator-mxnet/issues/19379#) can be 
resolved first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

[GitHub] [incubator-mxnet] ann-qin-lu commented on issue #20959: GPU memory leak when using gluon.data.DataLoader with num_workers>0

Reply via email to