MPolaris opened a new issue, #16646: URL: https://github.com/apache/tvm/issues/16646
In QNN-Frontent, the zero_point parameter reading method seems to be incorrect. In the '_qnn_conv2d_legalize_cuda' function, data of type uint8 will be shift, but only when zero_point is a scalar, i.e. only when Per-Tensor is considered, in Per-channel, zero_point will be a 1d array. I have submitted a [PR](https://github.com/apache/tvm/pull/16479) to fix this issue. Bugs can be replicated through the following code: ```python import onnx import numpy as np input_tensor = onnx.helper.make_tensor_value_info('input', onnx.TensorProto.FLOAT, [1,3,224,224]) output_tensor = onnx.helper.make_tensor_value_info('output', onnx.TensorProto.FLOAT, [1,3,112,112]) input_q_info = onnx.helper.make_tensor_value_info('input_q', onnx.TensorProto.UINT8, [1,3,224,224]) conv_q_info = onnx.helper.make_tensor_value_info('conv_q', onnx.TensorProto.UINT8, [1,3,112,112]) q1_scale = onnx.helper.make_tensor('q1_scale', onnx.TensorProto.FLOAT, [], [1]) q1_zero_point = onnx.helper.make_tensor('q1_zero_point', onnx.TensorProto.UINT8, [], [0]) q2_scale = onnx.helper.make_tensor('q2_scale', onnx.TensorProto.FLOAT, [], [1]) q2_zero_point = onnx.helper.make_tensor('q2_zero_point', onnx.TensorProto.UINT8, [], [0]) weight = onnx.helper.make_tensor('weight', onnx.TensorProto.UINT8, [3,3,3,3], np.random.randint(0, 255, (3,3,3,3)).astype(np.uint8)) bias = onnx.helper.make_tensor('bias', onnx.TensorProto.INT32, [3], np.random.randn(3).astype(np.int32)) w_scale = onnx.helper.make_tensor('w_scale', onnx.TensorProto.FLOAT, [3], [1,2,3]) w_zero_point = onnx.helper.make_tensor('w_zero_point', onnx.TensorProto.UINT8, [3], [1,2,3]) input_q = onnx.helper.make_node('QuantizeLinear', ['input', 'q1_scale', 'q1_zero_point'], ['input_q'], name='input_quantize') attrs = { "dilations":[1, 1], "group":1, "kernel_shape":[3, 3], "pads":[1, 1, 1, 1], "strides":[2, 2] } conv = onnx.helper.make_node('QLinearConv', ['input_q', 'q1_scale', 'q1_zero_point', 'weight', 'w_scale', 'w_zero_point', 'q2_scale', 'q2_zero_point', 'bias'], ['conv_q'], name='conv', **attrs) output = onnx.helper.make_node('DequantizeLinear', ['conv_q', 'q2_scale', 'q2_zero_point'], ['output'], name='output_dequantize') graph = onnx.helper.make_graph( [input_q, conv, output], 'quantized_graph', [input_tensor], [output_tensor], initializer=[q1_scale, q1_zero_point, q2_scale, q2_zero_point, weight, bias, w_scale, w_zero_point], value_info=[input_q_info, conv_q_info], ) model = onnx.helper.make_model(graph, opset_imports=[onnx.helper.make_opsetid("com.microsoft", 1), onnx.helper.make_opsetid("", 11)]) model_name = "./quantized.onnx" onnx.save_model(model, model_name) import tvm from tvm import relay onnx_model = onnx.load("./quantized.onnx") mod, params = relay.frontend.from_onnx(onnx_model) target ="cuda" with tvm.transform.PassContext(opt_level=3): executor = relay.build_module.create_executor( "graph", mod, tvm.cuda(0), target, params ).evaluate() -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
