[GitHub] [tvm] fxmarty commented on pull request #15111: [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from FasterTransformer into CUTLASS BYOC

via GitHub Fri, 30 Jun 2023 05:45:30 -0700


fxmarty commented on PR #15111:
URL: https://github.com/apache/tvm/pull/15111#issuecomment-1614601193


   Thank you, I'm trying to connect the dots between the different opinions I 
hear around using mma.sync.aligned op (which from what I understand of cutlass, 
the fastertransformer extension should come down to) vs not for GEMV, and your 
results are in line with my current intuition.
   
   Now the only mystery remaining to me is why pytorch chooses a tensor core 
based kernel for GEMV in fp16 * fp16.
   
   Maybe there could be opportunities to have a cutlass extension based rather 
on 
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/gemv.h 
/ 
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/gemv_batched_strided.h
 /  for the decoding, along with a gemm kernel for the prefill.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] fxmarty commented on pull request #15111: [Unity][BYOC] Integrate fp16 A - int4 B GEMM kernel from FasterTransformer into CUTLASS BYOC

Reply via email to