freedenS opened a new issue #10506:
URL: https://github.com/apache/tvm/issues/10506


   I tried to quantize resnet50 
(https://github.com/onnx/models/blob/main/vision/classification/resnet/model/resnet50-v1-7.onnx)
 according to the tutorial([Deploy a Quantized Model on 
Cuda](https://tvm.apache.org/docs/how_to/deploy_models/deploy_quantized.html?highlight=quantize))
 , quantize pass is ok, but i met a type error when inference. 
   it looks like fused_fixed_point_multiply doesn't support int64, but why 
there has a cast before it
   
   ### Expected behavior
   
   get the correct result
   
   ### Actual behavior
   
   ```
   ======= Execute  fused_nn_contrib_conv2d_NCHWc_41 ... 
    Output#0, CPU [1,8,56,56,8], Int8 at<0x6f07000>: 127 0 127 127 0 127 ... 
127 127 127 127 127 127 
    Output#1, CPU [8,8,1,1,8,8], Int8 at<0x4f35080>: -4 32 -10 -9 0 -1 ... -4 
-74 6 0 10 0 
    Output#2, CPU [1,8,56,56,8], Int32 at<0x8faf480>: -19050 -40132 5969 -17526 
-25908 -6985 ... -9144 -32512 -28194 -43688 3048 -29972 
   
   ======= Execute  fused_cast_1065 ... 
    Output#0, CPU [1,8,56,56,8], Int32 at<0x8faf480>: -19050 -40132 5969 -17526 
-25908 -6985 ... -9144 -32512 -28194 -43688 3048 -29972 
    Output#1, CPU [1,8,56,56,8], Int64 at<0x7527080>: -19050 -40132 5969 -17526 
-25908 -6985 ... -9144 -32512 -28194 -43688 3048 -29972 
   
   ======= Execute  fused_fixed_point_multiply_143 ... 
   Traceback (most recent call last):
   File "/tvm/python/tvm/contrib/graph_executor.py", line 248, in run
   self._run()
   File "/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
   raise get_last_ffi_error()
   tvm._ffi.base.TVMError: Traceback (most recent call last):
   12: TVMFuncCall
       at /tvm/src/runtime/c_runtime_api.cc:486
   11: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       at /tvm/include/tvm/runtime/packed_func.h:1150
   10: std::function<void (tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       at /usr/include/c++/5/functional:2267
   9: _M_invoke
       at /usr/include/c++/5/functional:1871
   8: operator()
       at /tvm/src/runtime/graph_executor/graph_executor.cc:575
   7: tvm::runtime::GraphExecutor::Run()
       at /tvm/src/runtime/graph_executor/graph_executor.cc:68
   6: std::function<void ()>::operator()() const
       at /usr/include/c++/5/functional:2267
   5: _M_invoke
       at /usr/include/c++/5/functional:1871
   4: operator()
       at /tvm/src/runtime/graph_executor/graph_executor.cc:508
   3: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       at /tvm/include/tvm/runtime/packed_func.h:1150
   2: std::function<void (tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, 
tvm::runtime::TVMRetValue*) const
       at /usr/include/c++/5/functional:2267
   1: _M_invoke
       at /usr/include/c++/5/functional:1871
   0: operator()
        at /tvm/src/runtime/library_module.cc:78
   File "/tvm/src/runtime/library_module.cc", line 78 
   TVMError: 
   ---------------------------------------------------------------
   An error occurred during the execution of TVM.
   For more information, please see: https://tvm.apache.org/docs/errors.html
   ---------------------------------------------------------------
   
     Check failed: ret == 0 (-1 vs. 0) : Assert fail: 
(((tir.tvm_struct_get(arg1, 0, 5) == (uint8)0) && (tir.tvm_struct_get(arg1, 0, 
6) == (uint8)32)) && (tir.tvm_struct_get(arg1, 0, 7) == (uint16)1)), arg1.dtype 
is expected to be int32
   ```
   
   ### Environment
   Ubuntu 18.04 x86 / tvm0.8 / python3.6
   
   ### Steps to reproduce
   
   ```
   def quantize(mod, params, data_aware):
       if data_aware:
           with relay.quantize.qconfig(calibrate_mode="kl_divergence", 
weight_scale="max"):
               mod = relay.quantize.quantize(mod, params, 
dataset=calibrate_dataset())
       else:
           with relay.quantize.qconfig(calibrate_mode="global_scale", 
global_scale=8.0):
               mod = relay.quantize.quantize(mod, params)
       return mod
   
   onnx_model = onnx.load(onnx_path)
   shape_dict = {'data': (1, 3, 224, 224)}
   
   mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
   mod = quantize(mod, params, True)
   
   target_host = 'llvm'
   ctx_host = tvm.cpu(0)
   
   with relay.build_config(opt_level=3):
       graph, lib, params = relay.build(mod,
                                           target=target_host,
                                           target_host=None,
                                           params=params)
   
   m = graph_executor.create(graph, lib, ctx_host)
   m.set_input(**params)
   
   m.set_input("data", tvm.nd.array(data))
   m.run()
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to