[GitHub] [incubator-mxnet] ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface

GitBox Thu, 07 Nov 2019 13:07:53 -0800

ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in 
gluon numpy interface
URL: https://github.com/apache/incubator-mxnet/pull/16716#issuecomment-551264243
 
 
   Ok, let me address those comments 1 point at a time :-).
    - usage of TVM/nvrtc - I am generally in favor of that (even though it is 
harder than it looks because those arrays do not have the same shape and the 
imperative nature of the code makes it tricky when such horizontal fusion can 
happen), but this is not a short term solution for this problem
    - other cases that look similar - I agree with you, longer term general 
solution is needed
    - reset_arrays is in contrib directory - that is unfortunate placement, I 
agree.
    - source of the performance overhead - no. I strongly encourage you to look 
at the profiler (something like `nvprof`, not MXNet profiler as it only gives 
you the info on how much time operator takes and does not tell you how much of 
that time was actually spent on the GPU) and look at it yourself. I agree that 
FFI and creating (and destroying) engine op takes some time (which could be 
reduced by e.g. having a pool of `ThreadedOpr`). The main source of the 
overhead in the GPU case however is actually the fact that each operation needs 
to synchronize after calling the kernel (as GPU is asynchronous with respect to 
the host CPU) in order to update engine dependencies, which for those super 
short operations like zeroing array not only slows down because of the overhead 
of this sync (`cudaStreamSynchronize` at the end of the operator), but also 
completely exposes overhead of the kernel launch of the next operator (because 
the fact that GPU is asynchronous and you could queue multiple launches is 
completely lost if you need to sync after every one of them).
    - `Also, doing slow things is not always bad.` - my HPC soul screams in 
terror when reading this :-P. I am not against having simple abstractions for 
the user - in fact I am all for it. The role of the framework though is 
internally take those simple abstractions and transform them into efficient 
execution.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface

Reply via email to