Qianshui-Jiang commented on code in PR #13642:
URL: https://github.com/apache/tvm/pull/13642#discussion_r1059227039


##########
python/tvm/topi/x86/tensor_intrin.py:
##########
@@ -348,3 +348,227 @@ def _instr(index):
         binds={data: a_buffer, kernel: b_buffer},
         default_buffer_params=buffer_params,
     )
+
+
+def dot_32x128x32_u8s8s32_sapphirerapids(LDA):
+    """
+    Int8 dot product by every 16x64 elements using AMX-TMUL Sapphire Rapids 
instructions.
+    The tdpxxd instruction takes two tile of uint8 and int8 datatype -- 
data[16][64] and
+    kernel[1][16][16][4] -- and computes a dot product of data[16][16] in 
int32 datatype.
+
+    (Physically, to efficiently leveraging the tile register, we constructing 
a 2x2 tiles
+    matmul which performs 32x128x32 in total)
+
+    The pseudo code is as follows:
+        for(k=0; k<2; k++){
+            for(n=0; n<2; n++){
+                tileload64(tmm_b, B)
+                for(m=0; m<2; m++){
+                    if(n==0)
+                        tileload64(tmm_a, A)

Review Comment:
   @cbalint13, Thanks a lot for your advices. 
   Regarding that in TVM we already have DNNL codegen and some external call to 
DNNL, which is an off-the -shelf compute lib leverages SIMD ISA transparently.
   The main focus in this PR is to add more native capability for TVM to 
generate AMX instructions. 
   AFAIK, the LLVM features for AMX are also moving forward such as register 
allocation for intrinsics and auto vectorization.
   Hence we choose to call LLVM intrinsics directly and hope to have more 
flexibility in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to