ptrendx opened a new pull request #19266:
URL: https://github.com/apache/incubator-mxnet/pull/19266


   ## Description ##
   This PR limits the number of kernels compiled by RTC for ElementWiseSum - 
not limiting the inputs to the launcher was resulting in code like this:
   ```
   using InputType0 = float32;
   ...
   using InputType10 = float32;
   ```
   which, even though the types beyond 4 were not used, was treated by a kernel 
cache as a new kernel, increasing the time needed to start the computation (as 
more kernels needed to be compiled).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to