ptrendx commented on issue #16716: [Numpy] Fix collect_params().zero_grad() in gluon numpy interface URL: https://github.com/apache/incubator-mxnet/pull/16716#issuecomment-551264243 Ok, let me address those comments 1 point at a time :-). - usage of TVM/nvrtc - I am generally in favor of that (even though it is harder than it looks because those arrays do not have the same shape and the imperative nature of the code makes it tricky when such horizontal fusion can happen), but this is not a short term solution for this problem - other cases that look similar - I agree with you, longer term general solution is needed - reset_arrays is in contrib directory - that is unfortunate placement, I agree. - source of the performance overhead - no. I strongly encourage you to look at the profiler (something like `nvprof`, not MXNet profiler as it only gives you the info on how much time operator takes and does not tell you how much of that time was actually spent on the GPU) and look at it yourself. I agree that FFI and creating (and destroying) engine op takes some time (which could be reduced by e.g. having a pool of `ThreadedOpr`). The main source of the overhead in the GPU case however is actually the fact that each operation needs to synchronize after calling the kernel (as GPU is asynchronous with respect to the host CPU) in order to update engine dependencies, which for those super short operations like zeroing array not only slows down because of the overhead of this sync (`cudaStreamSynchronize` at the end of the operator), but also completely exposes overhead of the kernel launch of the next operator (because the fact that GPU is asynchronous and you could queue multiple launches is completely lost if you need to sync after every one of them). - `Also, doing slow things is not always bad.` - my HPC soul screams in terror when reading this :-P. I am not against having simple abstractions for the user - in fact I am all for it. The role of the framework though is internally take those simple abstractions and transform them into efficient execution.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
