vinx13 opened a new pull request, #12864:
URL: https://github.com/apache/tvm/pull/12864

   This PR added a tuple-sum based implementation of layer norm. It performs 
one-pass reduction to compute mean and variance at the same time.
   Reducer pattern is also added to allow `LowerCrossThreadReduction` to handle 
this case.
   On CUDA, it will generate two kernels: one for reduction and one for 
elemwise operations. Because of some limitation of `compute_at` currently we 
are not able to fuse them into one kernel. 
   
   cc @MasterJH5574 @junrushao @AndrewZhaoLuo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to