AndrewZhaoLuo commented on a change in pull request #8952:
URL: https://github.com/apache/tvm/pull/8952#discussion_r703725015
##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3333,6 +3333,44 @@ def get_scalar(x, dtype="float32"):
return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
+class QLinearMatMul(OnnxOpConverter):
+ """Operator converter for QLinearMatMul from Microsoft onnxruntime contrib
opset."""
+
+ @classmethod
+ def _impl_v10(cls, inputs, attr, params):
+ def get_scalar(x, dtype="float32"):
Review comment:
There are a lot of similar get_scalar functions across many QLinear ops.
I would refactor all of these to a common method before it gets more unwieldy
to do so
##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3333,6 +3333,44 @@ def get_scalar(x, dtype="float32"):
return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
+class QLinearMatMul(OnnxOpConverter):
+ """Operator converter for QLinearMatMul from Microsoft onnxruntime contrib
opset."""
+
+ @classmethod
+ def _impl_v10(cls, inputs, attr, params):
+ def get_scalar(x, dtype="float32"):
+ if isinstance(x, _expr.Var) and x.name_hint in params:
+ return _op.const(params[x.name_hint].numpy(), dtype)
+ rank = len(infer_shape(x))
+ assert rank <= 1, "QLinearMul scale and zero_point input must be
scalars"
+ if rank == 1:
+ x = _op.squeeze(x, [0])
+ return _op.cast(x, dtype)
+
+ a = inputs[0]
+ a_scale = get_scalar(inputs[1])
+ a_zero_point = get_scalar(inputs[2], "int32")
+
+ b = inputs[3]
+ b_scale = get_scalar(inputs[4])
+ b_zero_point = get_scalar(inputs[5], "int32")
+
+ y_scale = fold_constant(get_scalar(inputs[6]))
+ y_zero_point = get_scalar(inputs[7], "int32")
+
+ dtype = infer_type(a).checked_type.dtype
+
+ a_rank = len(infer_shape(a))
+ b_rank = len(infer_shape(b))
+
+ assert ((a_rank == 2) and (b_rank == 2)), "QLinearMatMul importer
currently requires both 'a' and 'b' tensors to be 2D, but rank(a)={},
rank(b)={}".format(a_rank, b_rank)
+
+ a = _qnn.op.dequantize(inputs[0], a_scale, a_zero_point)
+ b = _qnn.op.dequantize(inputs[3], b_scale, b_zero_point)
+ out = _op.nn.matmul(a, b)
Review comment:
For QLinearMul, there's a reference to the fact that the onnx
implementation dequantizes and then requantizes. Can you put a similar
reference here so people know why QLinearMatMul is implemented this way?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]