ptrendx commented on issue #19360: URL: https://github.com/apache/incubator-mxnet/issues/19360#issuecomment-710476208
Ok, so I think I understand this issue more - the problem is that `shared_ptr` to the engine is a static variable here: https://github.com/apache/incubator-mxnet/blob/master/src/engine/engine.cc#L62 and so the destruction timing of the engine itself is not specified (depends on the order of binaries in the linked executable). This makes it possible for CUDA deinitialization to happen before or after the destruction of the engine. If it happens after then everything is OK, because as part of its destruction engine actually joins on the side threads. However, if the CUDA deinit happens before, then side thread doing the cleanup actually triggers the segfault. The easiest workaround would be to just skip cleanup on a side thread - @szha @mseth10 @leezu do you think that would be acceptable? Any other ideas? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
