[GitHub] [tvm] gbxu commented on issue #8991: [Bug] AutoScheduler not work when kernel contains conv2d+gradient and has a large size

GitBox Sat, 06 Nov 2021 06:04:14 -0700


gbxu commented on issue #8991:
URL: https://github.com/apache/tvm/issues/8991#issuecomment-962448719



   Weird. When I use auto_scheduler to search conv2d on V100, it shows "No 
valid state found in this search round. Check if it has traversed all of the 
search space." and lots of messages like
   ```
   No: 3        GFLOPS: 0.00 / 0.00     results: 
MeasureResult(error_type:RuntimeDeviceError, error_msg:Traceback (most recent 
call last):
     File "/home/test/tvm/python/tvm/auto_scheduler/measure.py", line 1124, in 
_rpc_run
       random_fill(empty_array)
     File "/home/test/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in 
__call__
       rai
   ...
   ----------
   An error occurred during the execution of TVM.
   ```
   My workload is as follows.
   ```
   @auto_scheduler.register_workload
   def my_workload():
       A = tvm.te.placeholder((128, 64, 224, 224), name='input0')
       W = tvm.te.placeholder((64, 64, 3, 3), name='input1')
       C = topi.nn.conv2d(A, W, (1,1), (1,1), (1,1), layout='NCHW', 
out_dtype=A.dtype) # (128, 64, 224, 224)
           return [A, W, C]
   ```
   
   After setting MAX_TRACEBACK_INFO_LEN=8192, I can get more details:
   ```
   Get 64 programs to measure:
   ........................................................
   
*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E*E==================================================
   No: 1        GFLOPS: 0.00 / 0.00     results: 
MeasureResult(error_type:RuntimeDeviceError, error_msg:Traceback (most recent 
call last):
     File "/home/test/tvm/python/tvm/auto_scheduler/measure.py", line 1124, in 
_rpc_run
       random_fill(empty_array)
     File "/home/test/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in 
__call__
       raise get_last_ffi_error()
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     4: TVMFuncCall
     3: _ZNSt17_Function_handlerIFvN3tvm7runtime7TVMArgsEPNS1_11
     2: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
     1: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int 
const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
     0: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, 
int, std::function<void (tvm::runtime::TVMArgs)>)
     File "/home/test/tvm/src/runtime/rpc/rpc_endpoint.cc", line 801
   TVMError: 
   ---------------------------------------------------------------
   An error occurred during the execution of TVM.
   For more information, please see: https://tvm.apache.org/docs/errors.html
   ---------------------------------------------------------------
     Check failed: (code == RPCCode::kReturn) is false: code=kShutdown
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/test/tvm/python/tvm/auto_scheduler/measure.py", line 1185, in 
_rpc_run_worker
       res = _rpc_run(*args)
     File "/home/test/tvm/python/tvm/auto_scheduler/measure.py", line 1143, in 
_rpc_run
       dev.free_raw_stream(stream)
     File "/home/test/tvm/python/tvm/_ffi/runtime_ctypes.py", line 456, in 
free_raw_stream
       check_call(_LIB.TVMStreamFree(self.device_type, self.device_id, stream))
     File "/home/test/tvm/python/tvm/_ffi/base.py", line 348, in check_call
       raise get_last_ffi_error()
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     42: 0xffffffffffffffff
     41: _start
     40: __libc_start_main
     39: main
     38: Py_Main
     37: 0x000000000063886a
     36: PyObject_Call
     35: 0x000000000058945c
     34: 0x0000000000507cd3
     33: _PyEval_EvalFrameDefault
     32: 0x000000000050a3fc
     31: 0x00000000005099ff
     30: 0x0000000000507cd3
     29: _PyEval_EvalFrameDefault
     28: 0x000000000050a22e
     27: 0x0000000000516284
     26: 0x0000000000507cd3
     25: _PyEval_EvalFrameDefault
     24: 0x000000000050a3fc
     23: 0x00000000005099ff
     22: 0x0000000000507cd3
     21: _PyEval_EvalFrameDefault
     20: PyObject_Call
     19: 0x00000000005893d9
     18: 0x0000000000507cd3
     17: _PyEval_EvalFrameDefault
     16: PyObject_Call
     15: 0x000000000058931a
     14: 0x0000000000507cd3
     13: _PyEval_EvalFrameDefault
     12: 0x000000000050a3fc
     11: 0x00000000005096c7
     10: _PyEval_EvalFrameDefault
     9: 0x000000000050a532
     8: _PyObject_FastCallKeywords
     7: 0x00007f2b9b1f5c12
     6: _ctypes_callproc
     5: ffi_call
     4: ffi_call_unix64
     3: TVMStreamFree
     2: tvm::runtime::RPCDeviceAPI::FreeStream(DLDevice, void*)
     1: non-virtual thunk to 
tvm::runtime::RPCClientSession::FreeStream(DLDevice, void*)
     0: std::_Function_handler<void (tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*), 
tvm::runtime::RPCEndpoint::Init()::{lambda(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, 
tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
     File "/home/test/tvm/src/runtime/rpc/rpc_endpoint.cc", line 681
   TVMError: 
   ---------------------------------------------------------------
   An error occurred during the execution of TVM.
   For more information, please see: https://tvm.apache.org/docs/errors.html
   ---------------------------------------------------------------
     Check failed: (code == RPCCode::kReturn) is false: code=1
   , all_cost:11.04, Tstamp:1636202883.20)
   ==================================================
   Placeholder: input0, input1
   blockIdx.x [email protected]@[email protected]@ (0,16384)
     vthread [email protected]@[email protected]@ (0,4)
       threadIdx.x [email protected]@[email protected]@ (0,64)
         for rc.0 (0,64)
           for rx.0 (0,3)
             threadIdx.x ax0@ax1@ax2@[email protected] (0,64)
               input1.shared = ...
             for ax0@ax1@ax2@[email protected] (0,126)
               threadIdx.x ax0@ax1@ax2@[email protected] (0,64)
                 pad_temp.shared = ...
             for ry.1 (0,3)
               for nn_c.3 (0,2)
                 for yy_c.4 (0,7)
                   for xx_c.4 (0,7)
                     compute.local = ...
         for nn.3 (0,2)
           for yy.3 (0,7)
             for xx.3 (0,7)
               compute = ...
   
   ==================================================
   
   ```
   hi, @comaniac . Is it due to the large problem size of conv2d?  I think it's 
a common setting of CNN model. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] gbxu commented on issue #8991: [Bug] AutoScheduler not work when kernel contains conv2d+gradient and has a large size

Reply via email to