kpuatamazon edited a comment on pull request #19562:
URL: https://github.com/apache/incubator-mxnet/pull/19562#issuecomment-744344661


   I've been using a c5.12xlarge `Intel(R) Xeon(R) Platinum 8275CL CPU @ 
3.00GHz`.  Assume these are some sort of seconds?  
   
   We should at least do `-march=native` to see if it's just a matter of CPU 
support i.e. MXNet doesn't seem to enable AVX512 by default and one could add 
CPUID dispatch.  
   
   Might as well reshape to two dimensions with the axis preserved and 
everything else multiplied.  The problem is identical for e.g. 100x28x10x10x10 
and 280000x10.  Also, those are some really small channels to layer normalize 
over.  
   
   Also, I feel like the optimal assembly implementation would benefit from a 
different ordering of the input tensor to allow for pure vertical adds whereas 
layer normalization is currently setup for horizontal adds.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to