valmat07 opened a new pull request, #12173:
URL: https://github.com/apache/tvm/pull/12173

   Profiling for OpenCL runs was hanging because nested timer runs were not 
correctly handled. This is because each call to the `Timer::Start` function on 
OpenCL device causes the OpenCL event queue to be cleared.
   
   The hang was due to the fact that the minimum run time was not reached for 
profiling, if it is set greater than zero. Since the queue is cleared every 
`Start` call, the time of only one run is returned, not `number`.
   
   It happens in this cycle:
   
   
https://github.com/apache/tvm/blob/75ec1cffa9f160dd3165fedbf4408731ebfa797a/src/runtime/profiling.cc#L876-L891
   
   Here `pf.CallPacked(args, &temp)` call function which contains one more call 
`Timer::Start` and `Timer::End`.
   
   Also profiling was hanging for `__nop` functions like `reshape` due to there 
are no OpenCL calls for this node. So that OpenCL timer returns 0, which is 
less than the minimum time value.
   
   The `__nop` functions are now skipped for profiling.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to