| Issue |
55455
|
| Summary |
clang: -ftime-trace output does not include device compilation
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Maetveis
|
As the title says, it would be nice when using offloading targets (like CUDA / HIP / OpenMP) the final time trace would include the traces from the device side compilations.
These files are already being generated (because the spawned processes inherit the -ftime-report flags), but are not merged to the final trace. In the case of CUDA they can be found under `/tmp/<file>-<hash>/<file>-<arch>.json`, under HIP they seem to be overwritten as they use the same filename as the host compilation.
Having clang merge these files would help usability for tools like [ninjatracing](https://github.com/nico/ninjatracing) (with the `--embed-time-trace` option) when used with CUDA/HIP that are not aware of the multiple compilation passes involved.
To reproduce, compile the following file with `clang main.cu --offload-arch=sm_35 --offload-arch=sm_80 -ftime-trace".
```cuda
#ifdef __CUDA_ARCH__
template <unsigned int N>
struct SlowToCompile {
constexpr static unsigned int value = N + SlowToCompile<N - 1>::value;
};
template <>
struct SlowToCompile<0> {
constexpr static unsigned int value = 0;
};
#endif
void __global__ kernel(unsigned int* dst) {
#ifdef __CUDA_ARCH__
*dst = SlowToCompile<1024>::value;
#endif
}
int main(int argc, char** argv) {
if(argc < 0) {
kernel<<< dim3(1), dim3(1) >>>(nullptr);
}
}
```
I would like to help fixing this and would like to hear your opinions on my proposed approach:
Add an option to the `-cc1` command line to merge the time trace output to an already existing file. The offloading drivers could then add this to the device compilation jobs when time trace is enabled.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs