AndrewZhaoLuo commented on a change in pull request #8952:
URL: https://github.com/apache/tvm/pull/8952#discussion_r703725015



##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3333,6 +3333,44 @@ def get_scalar(x, dtype="float32"):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset."""
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+        def get_scalar(x, dtype="float32"):

Review comment:
       There are a lot of similar get_scalar functions across many QLinear ops. 
I would refactor all of these to a common method before it gets more unwieldy 
to do so

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3333,6 +3333,44 @@ def get_scalar(x, dtype="float32"):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset."""
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+        def get_scalar(x, dtype="float32"):
+            if isinstance(x, _expr.Var) and x.name_hint in params:
+                return _op.const(params[x.name_hint].numpy(), dtype)
+            rank = len(infer_shape(x))
+            assert rank <= 1, "QLinearMul scale and zero_point input must be 
scalars"
+            if rank == 1:
+                x = _op.squeeze(x, [0])
+            return _op.cast(x, dtype)
+
+        a = inputs[0]
+        a_scale = get_scalar(inputs[1])
+        a_zero_point = get_scalar(inputs[2], "int32")
+
+        b = inputs[3]
+        b_scale = get_scalar(inputs[4])
+        b_zero_point = get_scalar(inputs[5], "int32")
+
+        y_scale = fold_constant(get_scalar(inputs[6]))
+        y_zero_point = get_scalar(inputs[7], "int32")
+
+        dtype = infer_type(a).checked_type.dtype
+
+        a_rank = len(infer_shape(a))
+        b_rank = len(infer_shape(b))
+
+        assert ((a_rank == 2) and (b_rank == 2)), "QLinearMatMul importer 
currently requires both 'a' and 'b' tensors to be 2D, but rank(a)={}, 
rank(b)={}".format(a_rank, b_rank)
+
+        a = _qnn.op.dequantize(inputs[0], a_scale, a_zero_point)
+        b = _qnn.op.dequantize(inputs[3], b_scale, b_zero_point)
+        out = _op.nn.matmul(a, b)

Review comment:
       For QLinearMul, there's a reference to the fact that the onnx 
implementation dequantizes and then requantizes. Can you put a similar 
reference here so people know why QLinearMatMul is implemented this way?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to