apeskov commented on PR #11508:
URL: https://github.com/apache/tvm/pull/11508#issuecomment-1144075748

   @crazydemo Answering your question about performance.
   
   > I wonder if we can get better performance via running layernorm on dnnl 
codegen than running consecutive ops on native codegen. Could you please 
provide some performance numbers?
   
   Yes, there is performance benefit. At least they use different memory access 
approach. Consecutive ops with llvm codegen will produce sequence of fused 
kernel like next:
   * mean. One pass through src tensor 
   * sub. One pass through src and dst tensor 
   * power + mean. One pass through src 
   * add + sqrt + div + mul + add. One pass through src and dst. 
   
   Totally we have 6 times traversing through data tensor for TVM codegen. DNNL 
implement it as single kernel and do only 4 passes through memory buffers (or 3 
in case of in place memory).
   
   In case of multi core system(xeon servers and other) normalise op is memory 
bound. And reduction of memory access becomes more important.    
      


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to