[GitHub] [tvm] areusch commented on a diff in pull request #13752: [microTVM] Use QNN schedules to give SOTA performance

via GitHub Fri, 03 Feb 2023 16:51:27 -0800


areusch commented on code in PR #13752:
URL: https://github.com/apache/tvm/pull/13752#discussion_r1086036906



##########
python/tvm/relay/qnn/op/_qnn.py:
##########
@@ -85,12 +91,72 @@ def simulated_dequantize_compute(attrs, inputs, 
output_type):
 register_strategy("qnn.conv2d", strategy.qnn_conv2d_strategy)
 
 
+def _get_clip_dtype_bounds(dtype):
+    """Returns the minimum and maximum values of a C integer data type."""
+    assert "int" in dtype
+    bits = int(dtype[dtype.find("int") + 3 :])
+
+    if dtype.startswith("int"):
+        return (-(2 ** (bits - 1)), 2 ** (bits - 1) - 1)
+    elif dtype.startswith("uint"):
+        return (0, 2**bits - 1)
+    else:
+        raise TVMError(f"Clip legalization is not supported for data type 
'{dtype}'!")
+
+
+@register_legalize("clip")
+def legalize_clip(attrs, inputs, tinfos):

Review Comment:
   are there tests for these?



##########
python/tvm/topi/arm_cpu/qnn_alter_op.py:
##########
@@ -19,57 +19,108 @@
 import numpy as np
 
 from tvm import nd, relay, target
-from ..nn import qnn_requantize_alter_layout, qnn_add_alter_layout
+from ..utils import get_const_tuple
+from ..nn import qnn_conv2d_alter_layout, qnn_add_alter_layout, 
qnn_requantize_alter_layout
 
 
-@qnn_requantize_alter_layout.register(["arm_cpu"])
-def alter_requantize_layout(attrs, inputs, _tinfos, _out_type):
-    """Changes a floating point requantize op to use int64 multiply + shift 
for microTVM.
+def prev_ops_match(curr_op, pattern):

Review Comment:
   can you write a docstring here?



##########
src/target/source/codegen_c.cc:
##########
@@ -631,8 +632,10 @@ void CodeGenC::PrintVecBinaryOp(const std::string& op, 
DataType t, PrimExpr lhs,
   }
 }
 
+NameSupply global_name_supply = NameSupply("");
 void CodeGenC::VisitStmt_(const AllocateConstNode* op) {
-  std::string symbol_name = op->buffer_var->name_hint;
+  std::string symbol_name = 
global_name_supply->FreshName(op->buffer_var->name_hint);

Review Comment:
   why was this needed? i think need to initialize the NameSupply from the 
IRModule



##########
python/tvm/topi/arm_cpu/qnn_legalize.py:
##########
@@ -0,0 +1,349 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""QNN legalization transforms that help eliminate sparse channels.
+
+Some models (like MobileNetV1 when fine-tuned) have output channels in their 
kernels which are
+completely full of zeros. Sometimes these can be optimized away by the C 
compiler, but this does not
+happen when complex schedules (like the ACLE tensordot convolutions) are used.
+
+Instead, we will remove these channels by replacing blocks of operators with 
equivalent "denser"
+ones during legalization. This is harder than it looks - while the outputs of 
channels with all-zero
+kernels do not depend on the input data, they are usually not zero. We work 
around this by computing
+how these constant values affect subsequent operators, and "folding" these 
effects into a bias_add.
+
+It would eventually be nice to have a generalized, cross-target solution for 
removing zero channels,
+as there is no downside. This may be possible with Relax, but I'm unsure.
+"""
+
+import numpy as np
+from scipy.signal import convolve2d
+from tvm.topi.utils import get_const_tuple
+from tvm import nd, relay
+from .qnn_alter_op import prev_ops_match, edit_attrs
+from ..nn import qnn_bias_add_legalize
+
+
+def _compute_fixed_conv2d_outputs(requantize_op):
+    """Compute all conv2d output values that do not depend on the layer 
input."""

Review Comment:
   could you document the return value, here and below?



##########
tests/python/relay/strategy/arm_cpu/test_quantized_convolution.py:
##########
@@ -95,6 +95,53 @@ def _get_mobilenet_v1_layer_attributes(layer_num):
     return ((1, 1, 1, 1), (1, 1), True)
 
 
[email protected]("layer", range(2, 27, 2))
[email protected]_package("tensorflow")
+def test_empty_channel_detection(interpreter, layer):
+    """Some models (mainly MobileNetV1) have kernels with many output channels 
full entirely of
+    zeroes. The VWW model is one of these. This test confirms that the outputs 
of these channels,
+    as computed by TensorFlow, are indeed not dependent upon the input values.
+    """
+
+    _, kernel, bias, output = _load_tflite_layer(interpreter, layer)
+    kernel_data, _ = kernel
+    bias_data, bias_quant = bias
+    output_data, output_quant = output
+    is_depthwise = _get_mobilenet_v1_layer_attributes(layer)[2]
+    assert not is_depthwise
+    assert kernel_data.shape[1] == kernel_data.shape[2] == 1
+
+    out_channels = kernel_data.shape[3]
+    fixed_channels = {}
+
+    out_zero_point = output_quant["zero_points"][0]
+    assert out_zero_point == -128
+
+    for i in range(out_channels):
+        # Skip over output channels with data
+        if np.any(kernel_data[i, 0, 0, :]):
+            continue
+
+        scale = bias_quant["scales"][i] / output_quant["scales"][0]
+        channel_constant = round(bias_data[i] * scale + out_zero_point)
+        clipped = min(127, max(-128, channel_constant))
+
+        out_channel_values = output_data[0, :, :, i].flatten()
+        assert all(x == clipped for x in out_channel_values)
+        fixed_channels[i] = clipped
+
+    # We now need to compute values for the following depthwise layer
+    if layer == 26:

Review Comment:
   anything else you can assert here other than `== 26`?



##########
python/tvm/topi/arm_cpu/qnn.py:
##########
@@ -368,3 +389,139 @@ def kernel_ptr(buffer, c, offset=0):
 def schedule_qnn_depthwise_conv2d(_attrs, _outs, _target):
     """Schedule function for qnn.depthwise_conv2d."""
     return None
+
+
+def _make_unrolled_conv2d_primfunc(

Review Comment:
   can you document the parameters here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] areusch commented on a diff in pull request #13752: [microTVM] Use QNN schedules to give SOTA performance

Reply via email to