Re: [apache/incubator-mxnet] [RFC] GPU performance improvements in MXNet engine (#18951)

Kellen Sunderland Mon, 17 Aug 2020 17:28:41 -0700

I really like this proposal, thanks for the great write-up Przemyslaw.

I haven't totally thought through pros/cons, but would it be possible to return 
a cudaStreamWaitEvent by default after every block of operators is called, and 
use that as a reference for any dependent block of ops? Would this unblock our 
GPU worker threads because we're not calling a cudaStreamSync?


If I'm understanding correctly that would be the equivalent of what you're 
proposing in your second scenario (when we have two cuda streams)? Would it 
have a lot of overhead in scenario 1 where we use same stream?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/18951#issuecomment-675180856

Re: [apache/incubator-mxnet] [RFC] GPU performance improvements in MXNet engine (#18951)

Reply via email to