Hello Community,

DL models, besides compute intensive operations like convolutions and fully 
connected layers, feature a lot of simple pointwise (aka elementwise) 
operations (like elementwise addition etc.). Performance of those operations is 
fully memory bandwidth bound and so it limits speedups from newer GPU hardware, 
which typically has high compute/memory bandwidth ratio. There are multiple 
attempts (e.g. TVM) ongoing to use compiler technology in order to deal with 
this and other, harder performance problems. However, integration of e.g. TVM 
into MXNet is a long term effort and there is a need for a simpler, more 
focused, approach to deal with this problem in the meantime.

This proposal (design doc [1], PR [2]) attempts to be a short term solution to 
this problem - using existing NNVM backend to MXNet and without a big 
refactoring required.

Any feedback and help will be greatly appreciated.

Thank you,
Przemek

[1] https://cwiki.apache.org/confluence/display/MXNET/GPU+Pointwise+fusion
[2] https://github.com/apache/incubator-mxnet/pull/15167

Reply via email to