ptrendx commented on issue #19360:
URL: 
https://github.com/apache/incubator-mxnet/issues/19360#issuecomment-710476208


   Ok, so I think I understand this issue more - the problem is that 
`shared_ptr` to the engine is a static variable here: 
https://github.com/apache/incubator-mxnet/blob/master/src/engine/engine.cc#L62 
and so the destruction timing of the engine itself is not specified (depends on 
the order of binaries in the linked executable). This makes it possible for 
CUDA deinitialization to happen before or after the destruction of the engine. 
If it happens after then everything is OK, because as part of its destruction 
engine actually joins on the side threads. However, if the CUDA deinit happens 
before, then side thread doing the cleanup actually triggers the segfault.
   
   The easiest workaround would be to just skip cleanup on a side thread - 
@szha @mseth10 @leezu do you think that would be acceptable? Any other ideas?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to