kpuatamazon edited a comment on issue #19688:
URL: 
https://github.com/apache/incubator-mxnet/issues/19688#issuecomment-749052168


   A related problem is excessive code generation.  Take `np.delete` for 
example.  
   
   
https://github.com/apache/incubator-mxnet/blob/16e2b15f6e334ca88f29b9c14e55547df2c136fc/src/operator/numpy/np_delete_op-inl.h#L337-L355
   
   That's:
   - MSHADOW_TYPE_SWITCH: 8 types on CPU and 7 types on GPU.
   - MXNET_NDIM_SWITCH cases 1 through 5.
   - MSHADOW_TYPE_SWITCH: 8 types on CPU and 7 types on GPU.
   - MXNET_ASSIGN_REQ_SWITCH: 2 cases
   
   That's 8 * 5 * 8 * 2 = 640 ways on CPU and 7 * 5 * 7 * 2 = 490 ways on GPU. 
   
   This problem operates on a single axis. It reduces to: size of outer loop 
(i.e. the product of dimensions before the axis), the size of the axis in 
question, and the size of the data after the axis (i.e. the product of 
dimensions after the axis).  After this simplification, there's no ndim 
dispatch.  Supports arbitrary dimensionality with a factor of 5 reduction in 
compilation to 128 cases.  
   
   In the common case where the types are the same and output is kWriteTo, a 
loop over memory copies is much faster.  If we're just copying POD data, then 
the size of the data type can be folded into the size of the data to copy.  So 
all 8 cases of identical input and output types with kWriteTo can be folded 
into one compilation, reducing 8 to 1.  
   
   So 121 cases: one for the normal copying operation and 120 for some 
combination of type conversion and/or kAddTo.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to