apeforest commented on issue #14496: performance degradation from 1.3.1 to 1.4.0 URL: https://github.com/apache/incubator-mxnet/issues/14496#issuecomment-477378018 @sun-dev is correct. The computation in this operator is not on the elements of the tensor but the between shape index of the tensor. There are add, multiplication and division involved in the transpose operator [here](https://github.com/dmlc/mshadow/blob/757a91c3ca4f5ebf4879739c0871d2d5534465ac/mshadow/extension/transpose.h#L74) I did a performance comparison of different arithmetic operations between 32-bit and 64-bit integers on CPU. There are noticable difference below. FYI, you can use [this](https://github.com/apeforest/doraemon/blob/master/perf32vs64.cc) code to reproduce. ``` result = 49995000 Add 32 time in clocks 24869 Add 32 time in ms 1359 result = 49995000 Add 64 time in clocks 6070 Add 64 time in ms 1971 result = 349965000 Add Mul 32 time in clocks 3601 Add Mul 32 time in ms 1196 result = 349965000 Add Mul 64 time in clocks 9967 Add Mul 64 time in ms 3477 result = 7137858 Add Div 32 time in clocks 8273 Add Div 32 time in ms 2878 result = 7137858 Add Div 64 time in clocks 24016 Add Div 64 time in ms 8499 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
