[GitHub] [incubator-mxnet] ptrendx commented on pull request #19426: [FEATURE] Use RTC for reduction ops

GitBox Wed, 19 May 2021 14:54:37 -0700


ptrendx commented on pull request #19426:
URL: https://github.com/apache/incubator-mxnet/pull/19426#issuecomment-844501183



   > Do you have any data on the overheads involved in RTC launch vs. compiled 
kernel launch, e.g. on the first iteration and thereafter (perhaps for both 
hybridized and unhybridized models)?
   
   There is an overhead on the first launch of the given kernel of 10ms-100ms 
since it needs to be compiled before use. After the compilation it is stored in 
a cache and any subsequent call is fast - I measured ~2us overhead for 
constructing the kernel code and cache lookup, which is comparable with the 
cudaLaunchKernel itself. There is not really any difference between the 
hybridized and nonhybridized models since the functionality works irrespective 
of hybridization.
   
   > 
   > I'm sorry to see all those floating point constants in the MXNet RTC code. 
Are there no compiler-defined constants that can be used, or is there a 
motivation for avoiding them?
   
   No floating point constants are compiler defined - they all come from header 
files (e.g. <climits>). The motivation of avoiding including external headers 
is to avoid the potential issues of finding the headers' location and the fact 
that in NVRTC we cannot include any header which contains host-only code.
   
   > 
   > Having worked on these reduce functions quite a bit, you probably have a 
good sense of the level of testing. Do you feel it's adequate? Can RTC-based 
reduction invoke any new regions of the operator parameter space?
   
   I think the level of testing is generally adequate and the change to RTC 
does not introduce any additional parameters to be tested. It actually 
consolidates the functionality and so improves the testing coverage (since 
previously some functions were using customized versions of the kernel e.g. 
from `src/operator/numpy/linalg/broadcast_reduce_customized-inl.cuh` and now 
all the usecases are handled by the same kernel code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] ptrendx commented on pull request #19426: [FEATURE] Use RTC for reduction ops

Reply via email to