ptrendx commented on pull request #19209:
URL: https://github.com/apache/incubator-mxnet/pull/19209#issuecomment-698635453


   Ok, I understand now the problem with reproducibility I saw - 
`cudnnSetDropoutDescriptor` is asynchronous and there was no proper 
synchronization of the CUDA stream, so if `cudnnSetDropoutDescriptor` was 
picked up by 1 thread and the dropout was picked up by another thread, there 
was race condition on the CUDA side. I fixed that in the latest commit.
   
   I still believe that there is a potential problem for resource assignment, 
although that is not something that would be typically hit as the ops are 
launched typically from a single thread. The thread-safe cachedop would be the 
main reason for this to fail, although it would require somebody to seed 
frequently during the execution, so that is also not very common scenario.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to