asmushetzel opened a new pull request #9444: enabling multithreading in broadcast_reduce URL: https://github.com/apache/incubator-mxnet/pull/9444 ## Description ## Add multithreading on CPU for the class of broadcast_reduce operators. For unknown reasons, this class of operators does not use any internal threading so far and therefore was observed to become a serious runtime bottleneck in an application. Threading is done on the level of sequences to be reduced, not within a single reduce sequence. This pattern is in line with the one that we are already doing in elemwise_binary_broadcast_op.h (where we achieve threading when starting the binary_broadcast_kernel). With this change, this class of operators shows good threading whenever we reduce over multiple sequences and matches the runtime characteristics of elemwise_binary_broadcast. ## Checklist ## ### Essentials ### - [x ] Passed code style checking (`make lint`) - [x ] Changes are complete (i.e. I finished coding on this PR) - [ x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
