TaylorZowtuk edited a comment on pull request #6909:
URL: https://github.com/apache/incubator-tvm/pull/6909#issuecomment-728307739


   > How do you hit this part of the code? Generally, it means you have some 
fatal errors in the code.
   > It is very rare to recover from a case where you have so many continuous 
errors.
   
   I'm not entirely certain what causes us to hit this condition. In our case, 
we observed from the AutoTvm debug prints that it was due to error_no=4 which 
is a RUNTIME_DEVICE error (as you can see from the except of AutoTvm log I 
included previously). Hitting this condition happened very intermittently. We 
could run a particular op/shape one time and hit the condition and without 
changing anything it would work the next. In addition, having one op/shape 
reach this condition didnt mean the rest of our op/shapes that we were running 
in the same script would fail meaning the system overall was able to recover. I 
think the main issue is that by terminating the program as soon as we meet this 
condition we dont allow for the chance to recover and additionally, we wont be 
getting this useful precise feedback about what error we are hitting while 
using the auto_scheduler.
   
   Ill do the rebasing and try to fix the CI issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to