kpuatamazon commented on pull request #19562:
URL: https://github.com/apache/incubator-mxnet/pull/19562#issuecomment-744344661


   I've been using a c5.12xlarge `Intel(R) Xeon(R) Platinum 8275CL CPU @ 
3.00GHz`.  Assume these are some sort of seconds?  
   
   We should at least do `-march=native` to see if it's just a matter of CPU 
support i.e. MXNet doesn't seem to enable AVX512 by default and one could add 
CPUID dispatch.  
   
   Might as well reshape to two dimensions with the axis preserved and 
everything else multiplied right?  
   
   Also, I feel like the optimal assembly implementation would benefit from a 
different ordering of the input tensor to allow for pure vertical adds whereas 
layer normalization is currently setup for horizontal adds.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to