ptrendx commented on issue #20959:
URL: 
https://github.com/apache/incubator-mxnet/issues/20959#issuecomment-1080969760


   The workaround skips the clean up for all engines, not just the NaiveEngine. 
   
   So, the general problem here is that when you create the dataloader, it 
creates a pool of workers by forking the main process, which creates a copy of 
everything, including the engine and the resources held by it. Then the forked 
process destroys this copy of the engine to become a much leaner dataloader 
worker. This would normally destroy the stream engine uses, but with the 
workaround commit in place, the destruction of the stream does not happen. Now, 
the problem is that CUDA does not in fact survive forking and the fact that it 
seems to work is just a lucky coincidence. That is why the spawn method should 
be used to fix the dataloader - with that the worker processes do not inherit 
anything from the parent and start from a clean state - with nothing copied to 
destroy.
   In principle in the end it should work the same way as currently, via shared 
memory so there should be no visible differences compared to the current way of 
things (if anything, it should actually work slightly faster, since it would 
not need to spend the time to destroy the copied engine during the dataloader 
construction). I guess the error that @TristonC encounters means that there is 
some additional issue in the dataloader that it somehow depends on some copied 
variable from the parent process in order to initiate the communication channel 
with the parent. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to