TaylorZowtuk opened a new pull request #6909:
URL: https://github.com/apache/incubator-tvm/pull/6909


   While running scripts using both AutoScheduler and AutoTvm to consecutively 
search for schedules for a number of operators/shapes, I observed different 
behaviors during measurement following the output “Too many errors happened 
during tuning.”
   
   After looking into the code I determined that the difference in behavior was 
due to AutoScheduler and AutoTvm handling the case of, the number of 
accumulated errors during measurement exceeding some threshold, differently.
   
   I observed that while using AutoTvm, the program would switch to debug level 
logging and continue search.
   ```
   Too many errors happen in the tuning. Now is in debug mode
   No: 217      GFLOPS: 0.00/0.00       result: 
MeasureResult(costs=(RuntimeError('Traceback (most recent call last):\n  [bt] 
(5) /home/tanvir/tvm/build/libtvm.so(TVMFuncCall+0x63) [0x7fd9b685ee13]\n  [bt] 
(4) /home/tanvir/tvm/build/libtvm.so(+0x1309037) [0x7fd9b68c8037]\n  [bt] (3) 
/home/tanvir/tvm/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs,
 tvm::runtime::TVMRetValue*) const+0x3fa) [0x7fd9b68cc86a]\n  [bt] (2) 
/home/tanvir/tvm/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*,
 TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> 
const&)+0x57) [0x7fd9b68c0217]\n  [bt] (1) 
/home/tanvir/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, 
TVMValue const*, int const*, int, std::function<void 
(tvm::runtime::TVMArgs)>)+0x6bd) [0x7fd9b68b546d]\n  [bt] (0) 
/home/tanvir/tvm/build/libtvm.so(+0x12f3668) [0x7fd9b68b2668]\n  File 
"/home/tanvir/tvm/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check 
failed: code == R
 PCCode: :kReturn: code=1'),), error_no=4, all_cost=10.765872716903687, 
timestamp=1604092331.4940712)   [('tile_f', [-1, 16]), ('tile_y', [-1, 2]), 
('tile_x', [-1, 2]), ('tile_z', [-1, 16])],None,1719
   …
   <continues>
   ```
   While using AutoScheduler, the program would crash after throwing an 
uncaught error.
   ```
   Traceback (most recent call last):
     …
     File "runner.py", line 124, in fig_6
       m = run_operator(
     File "runner.py", line 58, in run_operator
       sch, args = auto_scheduler.auto_schedule(task, 
tuning_options=tune_option)
     File "/home/taylor/tvm/python/tvm/auto_scheduler/auto_schedule.py", line 
213, in auto_schedule
       sch, tensors = _ffi_api.AutoSchedule(search_policy, tuning_options)
     File "/home/taylor/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, 
in __call__
       raise get_last_ffi_error()
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     [bt] (5) /home/taylor/tvm/build/libtvm.so(TVMFuncCall+0x63) 
[0x7f11e187d7b3]
     [bt] (4) /home/taylor/tvm/build/libtvm.so(+0x6965ab) [0x7f11e0c755ab]
     [bt] (3) 
/home/taylor/tvm/build/libtvm.so(tvm::auto_scheduler::AutoSchedule(tvm::auto_scheduler::SearchPolicy,
 tvm::auto_scheduler::TuningOptions)+0x11a) [0x7f11e0c74cca]
     [bt] (2) 
/home/taylor/tvm/build/libtvm.so(tvm::auto_scheduler::SketchPolicyNode::Search(int,
 int, int, tvm::auto_scheduler::ProgramMeasurer)+0x760) [0x7f11e0cfb3d0]
     [bt] (1) 
/home/taylor/tvm/build/libtvm.so(tvm::auto_scheduler::ProgramMeasurerNode::Measure(tvm::auto_scheduler::SearchTask
 const&, tvm::auto_scheduler::SearchPolicy const&, 
tvm::runtime::Array<tvm::auto_scheduler::MeasureInput, void> const&, 
tvm::runtime::Array<tvm::auto_scheduler::MeasureResult, void>*, int)+0x11ed) 
[0x7f11e0cd7b2d]
     [bt] (0) /home/taylor/tvm/build/libtvm.so(+0x6f4af8) [0x7f11e0cd3af8]
     File "/home/taylor/tvm/src/auto_scheduler/measure.cc", line 268
   TVMError: Too many errors happened during tuning
   ```
   
   In my particular case, AutoScheduler crashing rather than continuing to 
attempt searching meant that my script would terminate prematurely when it may 
have recovered from whatever was causing errors during search.
   In addition, I was unclear why this behavior was only occurring in 
AutoScheduler and not AutoTvm. This discrepancy in behavior can be confusing to 
new users who may want to explore both methods of schedule searching. This PR 
proposes bringing the AutoScheduler handling of errors in measurement in line 
with AutoTvm.
   
   By removing the LOG(FATAL) and changing verbosity for AutoScheduler in the 
same way we change logging level in AutoTvm the programs will behave the same. 
In addition, I changed the default verbosity of AutoScheduler to 0 (silent) in 
order to match the default logging level of AutoTvm.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to