[GitHub] [tvm-rfcs] ashutosh-arm commented on a change in pull request #15: [RFC] Use CMSIS-NN with TVM

GitBox Thu, 12 Aug 2021 08:38:42 -0700


ashutosh-arm commented on a change in pull request #15:
URL: https://github.com/apache/tvm-rfcs/pull/15#discussion_r687853090




##########
File path: rfcs/0015_Arm_CMSIS-NN_Integration.md
##########
@@ -0,0 +1,143 @@
+- Feature Name: [RFC] Use CMSIS-NN with TVM
+- Start Date: July 2021
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/15
+- GitHub Issue: https://github.com/apache/tvm/issues/8646
+
+# Acronyms
+CMSIS: Common Microcontroller Software Interface Standard
+ACL: The Compute Library for the Arm® Architecture
+MLF: Model Library Format
+FVP: Arm® Corestone™-300 Fixed Virtual Platform
+
+# Summary
+
+This RFC introduces plan of integration of CMSIS-NN library into TVM. It 
consists of efficient kernels targeted for Arm's Cortex-M architecture.
+
+Please refer to the following pages for more details on CMSIS-NN.
+* https://arm-software.github.io/CMSIS_5/NN/html/index.html
+* https://github.com/ARM-software/CMSIS_5/tree/develop/CMSIS/NN
+
+First PR in the series of PRs to fulfill this integration would be graph 
partitioner for softmax int8. Detailed plan can found below in this RFC.
+
+
+# Motivation
+
+CMSIS-NN library consists of hand-tuned kernels that are suitable for Cortex-M 
and are compliant with the quantization scheme used in Tensorflow Lite. They 
have been optimized for better performance and small memory footprint which is 
required on these embedded devices and it would make sense for TVM to reuse 
these while generating code for Cortex-M. They have been integrated with the 
TensorFlow Lite Micro project.
+
+
+# Guide-level explanation
+
+TVM's BYOC infrastructure allows for the partitioning and code generation 
using the external compiler. Partitioned subgraphs containing operator(s) 
targeted for Cortex-M can then be translated into the CMSIS-NN C APIs which 
eventually become part of MLF.
+
+If a user runs tvmc, they will get a MLF format archive which calls out to the 
CMSIS operators.
+
+```
+tvmc --target=cmsisnn,c --output-format=mlf --executor=aot
+```
+
+
+# Reference-level explanation
+
+We will enable this integration by considering TFLite networks, but is equally 
applicable for all other networks that can be translated into Relay IR. TFLite 
test that contains just a quantized (int8) softmax is first converted as a 
sequence of following relay operations: *dequantize -> softmax -> quantize* by 
the TFLite frontend. Please refer to the relay code snippet below obtained from 
TFLite frontend.
+
+```python
+def @main(%a: Tensor[(1, 16, 16, 3), int8]) -> Tensor[(1, 16, 16, 3), int8] {
+  %0 = qnn.dequantize(%a, 0.02f /* ty=float32 */, 64 /* ty=int32 */) /* 
ty=Tensor[(1, 16, 16, 3), float32] */;
+  %1 = nn.softmax(%0) /* ty=Tensor[(1, 16, 16, 3), float32] */;
+  qnn.quantize(%1, 0.02f /* ty=float32 */, 64 /* ty=int32 */, 
out_dtype="int8") /* ty=Tensor[(1, 16, 16, 3), int8] */
+}
+```
+
+Here is the API to obtain the partitioned function aimed at CMSIS-NN.
+
+```python
+    # API to call CMSIS-NN partitioning
+    from tvm.relay.op.contrib import cmsisnn
+        # Here, module is the relay module
+        cmsisnn_module = cmsisnn.partition_for_cmsisnn(module)        
+```
+
+Following code block shows the resultant IRModule.
+
+```python
+def @main(%a: Tensor[(1, 16, 16, 3), int8]) -> Tensor[(1, 16, 16, 3), int8] {
+  @tvmgen_default_cmsisnn_0(%a) /* ty=Tensor[(1, 16, 16, 3), int8] */
+}
+
+def @tvmgen_default_cmsisnn_0(%cmsisnn_0_i0: Tensor[(1, 16, 16, 3), int8], 
Inline=1, Compiler="cmsisnn", global_symbol="tvmgen_default_cmsisnn_0", 
Primitive=1) -> Tensor[(1, 16, 16, 3), int8] {
+  %2 = fn (%FunctionVar_0_0: Tensor[(1, 16, 16, 3), int8], 
PartitionedFromPattern="qnn.dequantize_nn.softmax_qnn.quantize_", 
Composite="cmsisnn.qnn_softmax") -> Tensor[(1, 16, 16, 3), int8] {
+    %0 = qnn.dequantize(%FunctionVar_0_0, 0.02f /* ty=float32 */, 64 /* 
ty=int32 */) /* ty=Tensor[(1, 16, 16, 3), float32] */;
+    %1 = nn.softmax(%0) /* ty=Tensor[(1, 16, 16, 3), float32] */;
+    qnn.quantize(%1, 0.02f /* ty=float32 */, 64 /* ty=int32 */, 
out_dtype="int8") /* ty=Tensor[(1, 16, 16, 3), int8] */
+  };
+  %2(%cmsisnn_0_i0) /* ty=Tensor[(1, 16, 16, 3), int8] */
+}
+```
+
+Above partitioned function is presented to the CMSIS-NN external code 
generator for *tir* generation using the TVM's build() API. 
+
+```python
+    # Invoke AOT compiler to get the MLF containing CMSIS-NN APIs
+    with tvm.target.Target("c -runtime=c --link-params -mcpu=cortex-m55 
--executor=aot --unpacked-api=1"):
+        factory = tvm.relay.build(cmsisnn_mod)
+
+```
+
+Intermediate *tir* looks like this:

Review comment:
       I've added a sample pattern matching table. AFAIU as part of this work, 
we plan to use APIs that can be used for Relay to CMSIS-NN mapping without any 
Relay level transforms.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] ashutosh-arm commented on a change in pull request #15: [RFC] Use CMSIS-NN with TVM

Reply via email to