vinx13 opened a new pull request, #12864: URL: https://github.com/apache/tvm/pull/12864
This PR added a tuple-sum based implementation of layer norm. It performs one-pass reduction to compute mean and variance at the same time. Reducer pattern is also added to allow `LowerCrossThreadReduction` to handle this case. On CUDA, it will generate two kernels: one for reduction and one for elemwise operations. Because of some limitation of `compute_at` currently we are not able to fuse them into one kernel. cc @MasterJH5574 @junrushao @AndrewZhaoLuo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
