echuraev opened a new pull request, #16061:
URL: https://github.com/apache/tvm/pull/16061

   Several Nvidia tools such as Nsight Systems and Nsight Compute can be used 
for profiling cuda kernels. NVIDIA Nsight Systems collects system-wide 
information about your program and GPU events and might help you to understand 
possible bottlenecks in your topology. To profile concrete Cuda kernel, NVIDIA 
Nsight Compute can be used.
   
   If you try to profile cuda kernel from TVM with Nsight Compute without this 
patch, then you see only SASS instructions instead of the source code. It is 
useful, but sometimes it might be easier to analyze generated cuda code instead 
of instructions. In this patch, a new pass config option was added. By using 
option `cuda.kernels_output_dir`, you can specify the directory where cuda 
source code should be stored after the build. Also, in the case of using this 
option, cuda kernels will be compiled with option `-lineinfo` which is an 
equivalent of `-g` option in GCC. When the cuda kernels were compiled with 
`-lineinfo` option, then Nsight compute can map profile information to the 
source code. One important note, that to get the source code in Nsight Compute, 
you have to set parameter `Import Source` during profiling session 
configuration equals to `Yes`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to