billishyahao opened a new pull request, #11513: URL: https://github.com/apache/tvm/pull/11513
This patch is to enhance the performance of DNNL BYOC dense operators by 1) introducing gelu fusion and 2) introducing alter dense weight layout. Why do we introduce gelu fusion: For the model family of BERT, GELU (Gaussian Error Linear Unit) activation is used heavily so if we perform gelu fusion in those models, then we gain a better performance boost. Why do we introduce automatically packed dense and its altered weight layout: Format tag::ab (aka. tag::NC) is not the best format selected by DNNL inner_product primitive. It is a drawback in current DNNL BYOC module. For what model it fit in: Dense intensity type such as Bert family With this patch, I benchmarked the inference performance of a kind of vision-tranformer called PCPVT (https://arxiv.org/abs/2104.13840) on ICX-8352Y. Here is some boost data: | 32 cores |Latency (dev) | |--|--| | stock byoc | 46.37ms (0.45ms) | | byoc w/ patch| 38.68ms (0.35ms) | Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
