[GitHub] [tvm] cconvey commented on a change in pull request #8952: [TVM] Add importer for ONNX QLinearMatMul op

GitBox Thu, 07 Oct 2021 16:04:26 -0700


cconvey commented on a change in pull request #8952:
URL: https://github.com/apache/tvm/pull/8952#discussion_r723679943




##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3506,156 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # This function has two goals, both of which are to satisfy the input 
requirements
+        # various Relay ops used below:

Review comment:
       comment typo: `# of various ...`

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3506,155 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # This function has two goals, both of which are to satisfy the input 
requirements
+        # various Relay ops used below:
+        #
+        # (1) If a values that's conceptually a scalar is represented as a 
tensor,
+        #     squeeze it down to just a scalar. This will always be possible 
for values
+        #     meeting the shape requirements.
+        #
+        # (2) When possible, simplify an expression down to a simple Relay 
Const node.
+        def try_convert_to_Constant(x, dtype_override=None):
+            if isinstance(x, _expr.Var) and x.name_hint in params:
+                return _op.const(params[x.name_hint].numpy(), dtype)
+
+            rank = len(infer_shape(x))
+            if rank == 0:
+                x_scalar = x
+                return x
+            elif rank == 1:
+                x_scalar = _op.squeeze(x, [0])

Review comment:
       This is copy-pasta from the `get_scalar` function, defined above by 
someone else.  I'm not sure if I can omit this without causing problems for the 
Relay ops I used further down.  Maybe this func should be named 
`try_convert_to_scalar_constant` instead.

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3506,155 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # This function has two goals, both of which are to satisfy the input 
requirements
+        # various Relay ops used below:
+        #
+        # (1) If a values that's conceptually a scalar is represented as a 
tensor,
+        #     squeeze it down to just a scalar. This will always be possible 
for values
+        #     meeting the shape requirements.
+        #
+        # (2) When possible, simplify an expression down to a simple Relay 
Const node.
+        def try_convert_to_Constant(x, dtype_override=None):
+            if isinstance(x, _expr.Var) and x.name_hint in params:
+                return _op.const(params[x.name_hint].numpy(), dtype)
+
+            rank = len(infer_shape(x))
+            if rank == 0:
+                x_scalar = x
+                return x
+            elif rank == 1:
+                x_scalar = _op.squeeze(x, [0])
+            else:
+                assert false, "op parameter '{}' must be 
scalar".format(x.name_hint)
+
+            if dtype_override is not None:
+                return fold_constant(_op.cast(x_scalar, dtype_override))
+            else:
+                return fold_constant(x_scalar)
+
+        # Unpack the inputs and obtain some type info...
+        a, a_scale, a_zp, b, b_scale, b_zp, y_scale, y_zp = inputs
+
+        a_type = infer_type(a).checked_type  # 'T1' in ONNX doc for this op
+        a_scale_type = infer_type(a_scale).checked_type
+        a_zp_type = infer_type(a_zp).checked_type
+
+        b_type = infer_type(b).checked_type  # 'T2' in ONNX doc for this op
+        b_scale_type = infer_type(b_scale).checked_type
+        b_zp_type = infer_type(b_zp).checked_type
+
+        y_scale_type = infer_type(y_scale).checked_type
+        y_zp_type = infer_type(y_zp).checked_type  # 'T3' in ONNX doc for this 
op
+
+        a_shape = infer_shape(a)
+        b_shape = infer_shape(b)
+
+        # Verify type assumptions, based on the ONNX doc for this op...
+        assert a_type.dtype in ["int8", "uint8"]
+        assert a_scale_type.dtype == "float32"
+        assert a_zp_type.dtype == a_type.dtype
+
+        assert b_type.dtype in ["int8", "uint8"]
+        assert b_scale_type.dtype == "float32"
+        assert b_zp_type.dtype == b_type.dtype
+
+        assert y_scale_type.dtype == "float32"
+        assert y_zp_type.dtype in ["int8", "uint8"]
+
+        # TODO: relax this limitation in a future version of this importer.
+        a_rank = len(a_shape)
+        b_rank = len(b_shape)
+        assert (a_rank == 2) and (b_rank == 2), (
+            "QLinearMatMul importer currently requires both 'a' and 'b' 
tensors to be 2D, but"
+            " rank(a)={}, rank(b)={}".format(a_rank, b_rank)
+        )
+
+        # _qnn.op.dense requires the zero-point values to have dtype int32.
+        a_scale_scalar = try_convert_to_Constant(a_scale)
+        a_zp_scalar = try_convert_to_Constant(a_zp, "int32")
+
+        b_scale_scalar = try_convert_to_Constant(b_scale)
+        b_zp_scalar = try_convert_to_Constant(b_zp, "int32")
+
+        y_scale_scalar = try_convert_to_Constant(y_scale)
+        y_zp_scalar = try_convert_to_Constant(y_zp, "int32")
+
+        # TODO: Confirm that we're using 'num_hidden_units' correctly / as 
intended with
+        # the '_qnn.op.dense' instance below.
+        num_hidden_units = infer_shape(b)[-1]
+
+        # - Specify the matmul result dtype as int32, so that hopefully the 
matmul will use
+        #   a 32-bit accumulator as seems to be required by the ONNX op's 
documentation.
+        #
+        # TL;DR:
+        # The ONNX documentation for this op is clear about acceptable overflow
+        # behavior during the matmul operation:
+        #   - The scalar multiplication ops MAY NOT overflow.
+        #   - The scalar addition ops, which sum the results of the scalar 
multiplication,
+        #     MAY overflow, but if they do so, it must behave as one would 
expect during
+        #     32-bit integer-addition overflow.
+        # As of this writing, Relay's qnn.op.dense operator doesn't expose a 
way for us to

Review comment:
       I could go either way about that. The "TODO" would actually apply to 
Relay's lack of ability to express this computational requirement, so the fix 
wouldn't reside in _this_ code.

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3506,155 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # This function has two goals, both of which are to satisfy the input 
requirements
+        # various Relay ops used below:
+        #
+        # (1) If a values that's conceptually a scalar is represented as a 
tensor,
+        #     squeeze it down to just a scalar. This will always be possible 
for values
+        #     meeting the shape requirements.
+        #
+        # (2) When possible, simplify an expression down to a simple Relay 
Const node.
+        def try_convert_to_Constant(x, dtype_override=None):
+            if isinstance(x, _expr.Var) and x.name_hint in params:
+                return _op.const(params[x.name_hint].numpy(), dtype)
+
+            rank = len(infer_shape(x))
+            if rank == 0:
+                x_scalar = x
+                return x
+            elif rank == 1:
+                x_scalar = _op.squeeze(x, [0])

Review comment:
       Actually, maybe I should separate the "make scalar" logic and "try make 
constant" logic into separate helper functions.  That's probably a better 
approach than my first idea.

##########
File path: tests/python/frontend/onnx/test_forward.py
##########
@@ -4941,7 +4941,6 @@ def verify_eyelike(indata):
     "test_mvn",
     # This test fails llvm with a lowering error:
     "test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded",
-    "test_qlinearmatmul_2D",
     "test_qlinearmatmul_3D",

Review comment:
       Not sure.  It looks like in that test, both input tensors are 3D 
([link](https://github.com/onnx/onnx/blob/master/onnx/backend/test/case/node/qlinearmatmul.py#L47-L62)).
   
   But IIUC from the comments, `qnn.dense` only allows the second (i.e., 
weight) tensor to be 2D 
([link](https://github.com/apache/tvm/blob/main/src/relay/qnn/op/dense.cc#L210)).
   
   So if that comment is accurate, I would guess(?) that either (a) the 
importer needs to generate extra ops to make up the difference, or (b) we'd 
need to extend `qnn.dense` to handle 3D weight tensors.
   
   Unfortunately I need to finish my work on this feature ASAP, so I can't dig 
into it any deeper than this.

##########
File path: tests/python/frontend/onnx/test_forward.py
##########
@@ -4941,7 +4941,6 @@ def verify_eyelike(indata):
     "test_mvn",
     # This test fails llvm with a lowering error:
     "test_nllloss_NCd1d2d3_none_no_weight_negative_ii_expanded",
-    "test_qlinearmatmul_2D",
     "test_qlinearmatmul_3D",

Review comment:
       Not sure.  It looks like in `test_qlinearmatmul_3D`, both input tensors 
are 3D 
([link](https://github.com/onnx/onnx/blob/master/onnx/backend/test/case/node/qlinearmatmul.py#L47-L62)).
   
   But IIUC from the comments, `qnn.dense` only allows the second (i.e., 
weight) tensor to be 2D 
([link](https://github.com/apache/tvm/blob/main/src/relay/qnn/op/dense.cc#L210)).
   
   So if that comment is accurate, I would guess(?) that either (a) the 
importer needs to generate extra ops to make up the difference, or (b) we'd 
need to extend `qnn.dense` to handle 3D weight tensors.
   
   Unfortunately I need to finish my work on this feature ASAP, so I can't dig 
into it any deeper than this.

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3506,155 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # This function has two goals, both of which are to satisfy the input 
requirements
+        # various Relay ops used below:
+        #
+        # (1) If a values that's conceptually a scalar is represented as a 
tensor,
+        #     squeeze it down to just a scalar. This will always be possible 
for values
+        #     meeting the shape requirements.
+        #
+        # (2) When possible, simplify an expression down to a simple Relay 
Const node.
+        def try_convert_to_Constant(x, dtype_override=None):
+            if isinstance(x, _expr.Var) and x.name_hint in params:
+                return _op.const(params[x.name_hint].numpy(), dtype)
+
+            rank = len(infer_shape(x))
+            if rank == 0:
+                x_scalar = x
+                return x
+            elif rank == 1:
+                x_scalar = _op.squeeze(x, [0])
+            else:
+                assert false, "op parameter '{}' must be 
scalar".format(x.name_hint)
+
+            if dtype_override is not None:
+                return fold_constant(_op.cast(x_scalar, dtype_override))
+            else:
+                return fold_constant(x_scalar)
+
+        # Unpack the inputs and obtain some type info...
+        a, a_scale, a_zp, b, b_scale, b_zp, y_scale, y_zp = inputs
+
+        a_type = infer_type(a).checked_type  # 'T1' in ONNX doc for this op
+        a_scale_type = infer_type(a_scale).checked_type
+        a_zp_type = infer_type(a_zp).checked_type
+
+        b_type = infer_type(b).checked_type  # 'T2' in ONNX doc for this op
+        b_scale_type = infer_type(b_scale).checked_type
+        b_zp_type = infer_type(b_zp).checked_type
+
+        y_scale_type = infer_type(y_scale).checked_type
+        y_zp_type = infer_type(y_zp).checked_type  # 'T3' in ONNX doc for this 
op
+
+        a_shape = infer_shape(a)
+        b_shape = infer_shape(b)
+
+        # Verify type assumptions, based on the ONNX doc for this op...
+        assert a_type.dtype in ["int8", "uint8"]
+        assert a_scale_type.dtype == "float32"
+        assert a_zp_type.dtype == a_type.dtype
+
+        assert b_type.dtype in ["int8", "uint8"]
+        assert b_scale_type.dtype == "float32"
+        assert b_zp_type.dtype == b_type.dtype
+
+        assert y_scale_type.dtype == "float32"
+        assert y_zp_type.dtype in ["int8", "uint8"]
+
+        # TODO: relax this limitation in a future version of this importer.
+        a_rank = len(a_shape)
+        b_rank = len(b_shape)
+        assert (a_rank == 2) and (b_rank == 2), (
+            "QLinearMatMul importer currently requires both 'a' and 'b' 
tensors to be 2D, but"
+            " rank(a)={}, rank(b)={}".format(a_rank, b_rank)
+        )
+
+        # _qnn.op.dense requires the zero-point values to have dtype int32.
+        a_scale_scalar = try_convert_to_Constant(a_scale)
+        a_zp_scalar = try_convert_to_Constant(a_zp, "int32")
+
+        b_scale_scalar = try_convert_to_Constant(b_scale)
+        b_zp_scalar = try_convert_to_Constant(b_zp, "int32")
+
+        y_scale_scalar = try_convert_to_Constant(y_scale)
+        y_zp_scalar = try_convert_to_Constant(y_zp, "int32")
+
+        # TODO: Confirm that we're using 'num_hidden_units' correctly / as 
intended with
+        # the '_qnn.op.dense' instance below.
+        num_hidden_units = infer_shape(b)[-1]
+
+        # - Specify the matmul result dtype as int32, so that hopefully the 
matmul will use
+        #   a 32-bit accumulator as seems to be required by the ONNX op's 
documentation.
+        #
+        # TL;DR:
+        # The ONNX documentation for this op is clear about acceptable overflow
+        # behavior during the matmul operation:
+        #   - The scalar multiplication ops MAY NOT overflow.
+        #   - The scalar addition ops, which sum the results of the scalar 
multiplication,
+        #     MAY overflow, but if they do so, it must behave as one would 
expect during
+        #     32-bit integer-addition overflow.
+        # As of this writing, Relay's qnn.op.dense operator doesn't expose a 
way for us to

Review comment:
       If you feel it's important I'll make the change, just let me know :)

##########
File path: python/tvm/relay/frontend/onnx.py
##########
@@ -3506,6 +3508,152 @@ def _impl_v10(cls, inputs, attr, params):
         return _qnn.op.quantize(out, c_scale, c_zero_point, out_dtype=dtype)
 
 
+class QLinearMatMul(OnnxOpConverter):
+    """
+    Operator converter for QLinearMatMul from Microsoft onnxruntime contrib 
opset.
+
+    Limitations:
+    - Only supports 2D input tensors.
+    - Not guaranteed to meet the integer-overflow behavior stipulated in the
+      ONNX documentation for this operator.
+    """
+
+    @classmethod
+    def _impl_v10(cls, inputs, attr, params):
+
+        # Some of the ops used below take scalar-like inputs, and may require 
either
+        # of the following:
+        #
+        # - the input is Const node (not merely an expression that *could* be 
reduced
+        #   to a single Const at graph-compilation time)
+        #
+        # - the input has a specific dtype
+        #
+        # This function attempts to present 'x' in a form that meets both of 
those
+        # requirements.
+        def try_resolve_to_const_scalar(x, dtype_override=None):
+            x2 = try_resolve_var_to_const(x, params)
+            x3 = ensure_scalar_shape(x2)
+
+            x_dtype = infer_type(x).checked_type.dtype
+            if (dtype_override is not None) and (dtype_override != x_dtype):
+                x4 = _op.cast(x3, dtype_override)
+            else:
+                x4 = x3
+
+            x5 = fold_constant(x4)
+            return x5
+
+        # Unpack the inputs and obtain some type info...
+        a, a_scale, a_zp, b, b_scale, b_zp, y_scale, y_zp = inputs
+
+        a_type = infer_type(a).checked_type  # 'T1' in ONNX doc for this op
+        a_scale_type = infer_type(a_scale).checked_type
+        a_zp_type = infer_type(a_zp).checked_type
+
+        b_type = infer_type(b).checked_type  # 'T2' in ONNX doc for this op
+        b_scale_type = infer_type(b_scale).checked_type
+        b_zp_type = infer_type(b_zp).checked_type
+
+        y_scale_type = infer_type(y_scale).checked_type
+        y_zp_type = infer_type(y_zp).checked_type  # 'T3' in ONNX doc for this 
op
+
+        a_shape = infer_shape(a)
+        b_shape = infer_shape(b)
+
+        # Verify type assumptions, based on the ONNX doc for this op...
+        assert a_type.dtype in ["int8", "uint8"]
+        assert a_scale_type.dtype == "float32"
+        assert a_zp_type.dtype == a_type.dtype
+
+        assert b_type.dtype in ["int8", "uint8"]
+        assert b_scale_type.dtype == "float32"
+        assert b_zp_type.dtype == b_type.dtype
+
+        assert y_scale_type.dtype == "float32"
+        assert y_zp_type.dtype in ["int8", "uint8"]
+
+        # TODO: relax this limitation in a future version of this importer.
+        a_rank = len(a_shape)
+        b_rank = len(b_shape)
+        assert (a_rank == 2) and (b_rank == 2), (
+            "QLinearMatMul importer currently requires both 'a' and 'b' 
tensors to be 2D, but"
+            " rank(a)={}, rank(b)={}".format(a_rank, b_rank)
+        )
+
+        # _qnn.op.dense requires the zero-point values to have dtype int32.
+        a_scale_scalar = try_resolve_to_const_scalar(a_scale)
+        a_zp_scalar = try_resolve_to_const_scalar(a_zp, "int32")
+
+        b_scale_scalar = try_resolve_to_const_scalar(b_scale)
+        b_zp_scalar = try_resolve_to_const_scalar(b_zp, "int32")
+
+        y_scale_scalar = try_resolve_to_const_scalar(y_scale)
+        y_zp_scalar = try_resolve_to_const_scalar(y_zp, "int32")
+
+        # TODO: Confirm that we're using 'num_hidden_units' correctly / as 
intended with
+        # the '_qnn.op.dense' instance below.
+        num_hidden_units = infer_shape(b)[-1]
+
+        # - Specify the matmul result dtype as int32, so that hopefully the 
matmul will use
+        #   a 32-bit accumulator as seems to be required by the ONNX op's 
documentation.
+        #
+        # TL;DR:
+        # The ONNX documentation for this op is clear about acceptable overflow
+        # behavior during the matmul operation:
+        #   - The scalar multiplication ops MAY NOT overflow.
+        #   - The scalar addition ops, which sum the results of the scalar 
multiplication,
+        #     MAY overflow, but if they do so, it must behave as one would 
expect during
+        #     32-bit integer-addition overflow.
+        # As of this writing, Relay's qnn.op.dense operator doesn't expose a 
way for us to
+        # express these constraints.
+        matmul_result_dtype = "int32"
+
+        matmul_result = _qnn.op.dense(

Review comment:
       I'm thinking that this should probably be part of a larger discussion 
about what collection of matmul-related ops are provided by Relay / Relax. If 
you don't mind I'll bring the topic up in a different forum.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] cconvey commented on a change in pull request #8952: [TVM] Add importer for ONNX QLinearMatMul op

Reply via email to