zhuwenxi commented on a change in pull request #47:
URL: https://github.com/apache/tvm-rfcs/pull/47#discussion_r771072867



##########
File path: rfcs/0046-Intel-LIBXSMM-integration.md
##########
@@ -0,0 +1,97 @@
+# Summary
+This RFC introduces the plan of integrating LIBXSMM into TVM. LIBXSMM 
leverages JIT code generator to produce high efficient kernels targeting x86 
architectures. 
+
+For details of LIBXSMM, please refer to:
+* [LIBXSMM User Manual](https://libxsmm.readthedocs.io/en/latest/)
+* [LIBXSMM github repo](https://github.com/hfp/libxsmm)
+
+# Motivation
+TVM has shown satisfactory performance on MLP models with CPU. However there 
are still some defects in the assembly code generated by LLVM which block 
AutoTVM/AutoScheduler from achieving optimal on GEMM.
+
+LIBXSMM is a open source library developed by Intel Lab for accelerating small 
matrix multiplication. It leverages the JIT code generator to generate high 
efficient GEMM kernels for x86 CPU, which could be very close to hardware 
rootline. According to our evaluation, in “small” GEMM (cube_root(m * n * k) <= 
256) , LIBXSMM shows a superior performance over the well-known BLAS library 
Intel MKL. 
+
+By the way, given that LIBXSMM can generate quite efficient GEMM kernel 
implementation, it is also an ideal substitution for inner-kernel of normal 
size GEMM. According our experiments, the AutoTVM templates we wrote with 
LIBXSMM as register-block generation, has a much higher performance comparing 
to MKL and existing TOPI implementation.
+
+# Guide-level explanation
+This proposal aims to integrate LIBXSMM into TVM to accelerate small GEMM and 
serve as inner-kernel to accelerate normal size GEMM.
+
+We will integrate LIBXSMM with TVM in following 3 components:
+1. Add extern call “tvm.contrib.libxsmm.gemm” in “src/runtime/contrib” 
directory, and corresponding python interface in "python/tvm/contrib/" 
directory, so users can call them just as CBLAS; 
+2. Use BYOC to accelerate small GEMM (cube_root(m * n * k ) <= 256) and its 
epilogue fusion variations (bias/relu/sigmoid/bias_relu/bias_sigmoid);
+3. AutoTVM template we wrote with LIBXSMM as inner kernel into TOPI, as a GEMM 
implementation candidate.
+
+# Reference-level explanation
+1. Users can call libxsmm as CBLAS through extern call API.
+```
+       def matmul(lhs, rhs, transa=False, transb=False, alpha=1.0, beta=0.0, 
lda=-1, ldb=-1, ldc=-1,           **kwargs):
+               n = lhs.shape[1] if transa else lhs.shape[0]
+               m = rhs.shape[0] if transb else rhs.shape[1]
+               return te.extern(
+               (n, m),
+               [lhs, rhs],
+               lambda ins, outs: tvm.tir.call_packed(
+               "tvm.contrib.libxsmm.matmul", ins[0], ins[1], outs[0], transa, 
transb, alpha, beta, lda, ldb, ldc),
+               name="C",
+               **kwargs,
+               )
+```
+2. BYOC allows for graph partitioning and using LIBXSMM for code generation.
+       * API to obtain the partitioned function:
+```
+       from tvm.relay.op.contrib import libxsmm
+
+       # API to call LIBXSMM partitioning
+    libxsmm_module = libxsmm.partition_for_cmsisnn(module) 

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to