ann-qin-lu commented on issue #20959: URL: https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1081046122
Hi @ptrendx, thanks a lot for the explanation! Now I get a much clear picture of what's going wrong. If the actually RC is that "CUDA does not in fact survive forking", does it mean multiprocessing with `fork` method should be avoided from the very beginning? Just a quick summary for the two approaches we discussed: * with the workaround that skips the clean up for all engines, it has the issue of lingering gpu resources held by engine, whenever the multiprocess fork method is used. Proposed solution is to use `spawn` in Gluon.DataLoader. @waytrue17 if you can help? * if we revert the workaround, we will see the non-deterministic segfault issue at exit. This segfault could be resolved if this open Open issue for [Better handling of the engine destruction](https://github.com/apache/incubator-mxnet/issues/19379#) can be resolved first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org