KellenSunderland edited a comment on issue #18951:
URL: 
https://github.com/apache/incubator-mxnet/issues/18951#issuecomment-675180856


   I really like this proposal, thanks for the great write-up Przemyslaw.
   
   I haven't totally thought through pros/cons, but would it be possible to 
return a cudaStreamWaitEvent by default after every block of operators is 
called, and use that as a reference for any dependent block of ops? Would this 
unblock our GPU worker threads because we're not calling a cudaStreamSync?
   
   If I'm understanding correctly this would be the equivalent of what you're 
proposing in your second scenario (when we have two cuda streams)? Would it 
have a lot of overhead in scenario 1 where we use same stream?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to