wkcn edited a comment on pull request #18707: URL: https://github.com/apache/incubator-mxnet/pull/18707#issuecomment-659304799
Performance Benchmark: transpose operator on CPU, **axes is generated randomly** ndim | max use (kb) | avg time (ms) ---|---|-- 1|12582.9121|1.0786 2|12582.9121|1.0851 3|12582.9121|0.6763 4|12582.9121|1.2172 5|12582.9121|6.4305 6|12582.9121|11.7841 7|12583.3604|65.7184 8|12583.4238|65.2171 9|12583.4883|82.4930 The increase of memory footprint is slight, but the time is intolerable when `axes.ndim() > 6`. If **axes is monotonically increasing** (namely [0, 1, 2, 3, ..., ndim - 1]) (comment the line 21st `random.shuffle(axes)`), ndim | max use (kb) | avg time (ms) ---|---|-- 1|12582.9121|1.1492 2|12582.9121|1.1732 3|12582.9121|1.3264 4|12582.9121|1.3896 5|12582.9121|0.9107 6|12582.9121|0.8965 7|12583.3604|0.9028 8|12583.4238|0.9105 9|12583.4883|0.8981 If **axes is monotonically decreasing** (namely [ndim - 1, ndim -2, ..., 2, 1, 0]) ndim | max use (kb) | avg time (ms) ---|---|-- 1|12582.9121|1.1290 2|12582.9121|1.1204 3|12582.9121|1.1874 4|12582.9121|1.4240 5|12582.9121|7.7080 6|12582.9121|24.0448 7|12583.3604|115.1126 8|12583.4238|105.9091 9|12583.4883|106.3913 Compare with NumPy Transpose: ndim | numpy time (s) | mxnet time (s) --|--|-- 1 | 0.1621077060699463 | 0.31803297996520996 2 | 0.2637207508087158 | 0.33347415924072266 3 | 0.4311816692352295 | 0.47667574882507324 4 | 0.5303101539611816 | 0.49021244049072266 5 | 0.5940566062927246 | 1.48443603515625 6 | 0.8220541477203369 | 2.03752064704895 7 | 0.8727006912231445 | 9.488046169281006 8 | 1.0004301071166992 | 9.947605848312378 9 | 1.2341070175170898 | 12.262272119522095 Test Code: ```python import mxnet as mx from mxnet import profiler print(mx) import numpy as np from numpy.testing import assert_allclose import time import random seed = 42 np.random.seed(seed) mx.random.seed(seed) #configure the profiler profiler.set_config(profile_all=True, aggregate_stats=True, filename='trace_profile.json') #start the profiler collecting data def test_transpose(ndim): for t in range(20): dims = [4 for _ in range(ndim)] dims[-1] *= 4 ** (10 - ndim) axes = list(range(ndim)) random.shuffle(axes) axes = tuple(axes) x = mx.nd.array(np.random.normal(size=dims)) y = mx.nd.transpose(x, axes=axes) assert_allclose(np.transpose(x.asnumpy(), axes=axes), y.asnumpy()) for ndim in range(1, 10): profiler.set_state('run') tic = time.time() test_transpose(ndim) print(ndim, "====", time.time() - tic) #stop the profiler profiler.set_state('stop') #dump the profiling data as a string print(profiler.dumps(reset=True)) print("Over") ``` Test Code, compare with NumPy ```python import mxnet as mx from mxnet import profiler print(mx) import numpy as np from numpy.testing import assert_allclose import time import random seed = 42 np.random.seed(seed) mx.random.seed(seed) def test_transpose(ndim): np_time = 0 mx_time = 0 for t in range(20): dims = [5 for _ in range(ndim)] dims[-1] *= 5 ** (10 - ndim) axes = list(range(ndim)) random.shuffle(axes) axes = tuple(axes) x_np = np.array(np.random.normal(size=dims), dtype=np.float32) x_mx = mx.nd.array(x_np, dtype=np.float32) for _ in range(2): y_np = np.transpose(x_np, axes=axes).copy() y_mx = mx.nd.transpose(x_mx, axes=axes) y_mx.asnumpy() tic_np = time.time() for _ in range(1): y_np = np.transpose(x_np, axes=axes).copy() np_time += time.time() - tic_np tic_mx = time.time() for _ in range(1): y_mx = mx.nd.transpose(x_mx, axes=axes) y_mx.asnumpy() mx_time += time.time() - tic_mx print(f"{ndim} | {np_time} | {mx_time}") for ndim in range(1, 10): test_transpose(ndim) print("Over") ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org