DickJC123 edited a comment on issue #15167: Pointwise fusion for GPU
URL: https://github.com/apache/incubator-mxnet/pull/15167#issuecomment-551332574
 
 
   After some investigation, I have an explanation and planned fix for the perf 
regression. To repeat what @ptrendx mentions, the real-time compilation of 
fused kernels takes additional time up front, with the idea that over many 
kernel invocations, the compile time will be more than made up for by the 
increased efficiency of the fused op.  This matches the typical use case 
(unlike CI), so I believe that fusion should be left enabled by default.
   
   Now one saving thing for each of the 3 tests mentioned by @rondogency is 
that most of the fused-ops in the test are duplicates of others seen earlier in 
the test.  In fact <2% of the created fused-ops are unique.  To fix this then I 
will submit a PR to introduce a 'fused op cache' that will map (source-code, 
gpu_arch) -> runnable kernel.  This should eliminate most of the runtime 
compilations and correct in large part the issue flagged here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to