echuraev opened a new pull request, #16061: URL: https://github.com/apache/tvm/pull/16061
Several Nvidia tools such as Nsight Systems and Nsight Compute can be used for profiling cuda kernels. NVIDIA Nsight Systems collects system-wide information about your program and GPU events and might help you to understand possible bottlenecks in your topology. To profile concrete Cuda kernel, NVIDIA Nsight Compute can be used. If you try to profile cuda kernel from TVM with Nsight Compute without this patch, then you see only SASS instructions instead of the source code. It is useful, but sometimes it might be easier to analyze generated cuda code instead of instructions. In this patch, a new pass config option was added. By using option `cuda.kernels_output_dir`, you can specify the directory where cuda source code should be stored after the build. Also, in the case of using this option, cuda kernels will be compiled with option `-lineinfo` which is an equivalent of `-g` option in GCC. When the cuda kernels were compiled with `-lineinfo` option, then Nsight compute can map profile information to the source code. One important note, that to get the source code in Nsight Compute, you have to set parameter `Import Source` during profiling session configuration equals to `Yes`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
