asmushetzel opened a new pull request #9444: enabling multithreading in 
broadcast_reduce
URL: https://github.com/apache/incubator-mxnet/pull/9444
 
 
   ## Description ##
   Add multithreading on CPU for the class of broadcast_reduce operators. For 
unknown reasons, this class of operators does not use any internal threading so 
far and therefore was observed to become a serious runtime bottleneck in an 
application. Threading is done on the level of sequences to be reduced, not 
within a single reduce sequence. This pattern is in line with the one that we 
are already doing in elemwise_binary_broadcast_op.h (where we achieve threading 
when starting the binary_broadcast_kernel). 
   With this change, this class of operators shows good threading whenever we 
reduce over multiple sequences and matches the runtime characteristics of 
elemwise_binary_broadcast.
   
   ## Checklist ##
   ### Essentials ###
   - [x ] Passed code style checking (`make lint`)
   - [x ] Changes are complete (i.e. I finished coding on this PR)
   - [ x] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to