roywei edited a comment on issue #15429: Operator Performance Regression on CPU URL: https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-508597164 I agree that what impact actuall user experience is the final speed of the model inference/training as operators get fused and there are other performance improvement technics applied on overall model. Currently we don't have enough data to say these ops regression we have will impact actuall model speed. **Regarding the OP regression, we are focusing on root causing regression of broadcast ops, rest ops should not block 1.5.0 release.** We found out no matter what's the flag (int32/64), there is around 15% regression on broadcast ops, on both mxnet-mkl and mxnet pip packages between 1.4.1 and 1.5.0. I'm still root causing it. | mxnet-mkl | mxnet-mkl | mxnet-mkl | mxnet-mkl | mxnet-mkl | | mxnet | mxnet | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | 1.4.1 (int64) | 1.5.0 int32 | 1.5.0 int64 | 1.5 int 32 vs 1.4.1 | 1.5 int 64 vs 1.4.1 | | 1.5.0 mxnet | 1.4.1 mxnet | regression broadcast_add | 0.00242 | 0.00286 | 0.0029 | 18% | 20% | | 0.0021 | 0.0016 | 31% broadcast_div | 0.00244 | 0.00286 | 0.00296 | 17% | 21% | | 0.0021 | 0.0017 | 24% broadcast_equal | 0.0024 | 0.00294 | 0.00294 | 23% | 23% | | 0.0021 | 0.0016 | 31% broadcast_greater | 0.00244 | 0.00282 | 0.00294 | 16% | 20% | | 0.0021 | 0.0016 | 31% broadcast_greater_equal | 0.00244 | 0.00284 | 0.00286 | 16% | 17% | | 0.002 | 0.0017 | 18% broadcast_hypot | 0.0025 | 0.00292 | 0.00302 | 17% | 21% | | 0.0021 | 0.0016 | 31% broadcast_lesser | 0.00242 | 0.00288 | 0.00298 | 19% | 23% | | 0.002 | 0.0015 | 33% broadcast_lesser_equal | 0.00246 | 0.00288 | 0.00288 | 17% | 17% | | 0.0021 | 0.0017 | 24% broadcast_logical_and | 0.0024 | 0.0029 | 0.00292 | 21% | 22% | | 0.0021 | 0.0016 | 31% broadcast_logical_or | 0.0025 | 0.00288 | 0.00288 | 15% | 15% | | 0.0021 | 0.0016 | 31% broadcast_logical_xor | 0.00242 | 0.0029 | 0.00296 | 20% | 22% | | 0.0021 | 0.0016 | 31% broadcast_maximum | 0.00248 | 0.00288 | 0.00284 | 16% | 15% | | 0.0021 | 0.0016 | 31% broadcast_minimum | 0.0025 | 0.00286 | 0.00284 | 14% | 14% | | 0.002 | 0.0015 | 33% broadcast_mod | 0.00262 | 0.00294 | 0.00302 | 12% | 15% | | 0.0021 | 0.0016 | 31% broadcast_mul | 0.00244 | 0.00288 | 0.00298 | 18% | 22% | | 0.0021 | 0.0017 | 24% broadcast_not_equal | 0.00258 | 0.00296 | 0.00294 | 15% | 14% | | 0.002 | 0.0016 | 25% broadcast_power | 0.00296 | 0.00336 | 0.00348 | 14% | 18% | | 0.0023 | 0.0019 | 21% broadcast_sub | 0.0025 | 0.00296 | 0.003 | 18% | 20% | | 0.002 | 0.0016 | 25%
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
