valmat07 opened a new pull request, #12173: URL: https://github.com/apache/tvm/pull/12173
Profiling for OpenCL runs was hanging because nested timer runs were not correctly handled. This is because each call to the `Timer::Start` function on OpenCL device causes the OpenCL event queue to be cleared. The hang was due to the fact that the minimum run time was not reached for profiling, if it is set greater than zero. Since the queue is cleared every `Start` call, the time of only one run is returned, not `number`. It happens in this cycle: https://github.com/apache/tvm/blob/75ec1cffa9f160dd3165fedbf4408731ebfa797a/src/runtime/profiling.cc#L876-L891 Here `pf.CallPacked(args, &temp)` call function which contains one more call `Timer::Start` and `Timer::End`. Also profiling was hanging for `__nop` functions like `reshape` due to there are no OpenCL calls for this node. So that OpenCL timer returns 0, which is less than the minimum time value. The `__nop` functions are now skipped for profiling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
