bgawrych opened a new pull request #20621: URL: https://github.com/apache/incubator-mxnet/pull/20621
## Description ## Improves performance of stack operation. Performance results shows significant speedup on axis=0 (up to 7x faster). Performance results collected on CLX8280 with `KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 OMP_NUM_THREADS=28 numactl --physcpubind=0-27 --membind=0`: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=Excel.Sheet> <meta name=Generator content="Microsoft Excel 15"> <link id=Main-File rel=Main-File href="file:///C:/Users/bgawrych/AppData/Local/Temp/msohtmlclip1/01/clip.htm"> <link rel=File-List href="file:///C:/Users/bgawrych/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml"> </head> <body link="#0563C1" vlink="#954F72"> | | master | onednn -- | -- | -- | -- shape | axis | time | time (128, 128) | 0 | 0.007561 | 0.008217 (128, 128) | 1 | 0.004158 | 0.00457 (128, 512) | 0 | 0.014108 | 0.007263 (128, 512) | 1 | 0.004416 | 0.005567 (128, 1024) | 0 | 0.024753 | 0.009431 (128, 1024) | 1 | 0.0046 | 0.004892 (128, 4096) | 0 | 0.088938 | 0.025933 (128, 4096) | 1 | 0.006305 | 0.006167 (512, 128) | 0 | 0.012593 | 0.006721 (512, 128) | 1 | 0.004545 | 0.00462 (512, 512) | 0 | 0.043897 | 0.01301 (512, 512) | 1 | 0.005042 | 0.005218 (512, 1024) | 0 | 0.079853 | 0.016997 (512, 1024) | 1 | 0.006117 | 0.006382 (512, 4096) | 0 | 0.517834 | 0.097284 (512, 4096) | 1 | 0.070154 | 0.038691 (1024, 128) | 0 | 0.022151 | 0.008327 (1024, 128) | 1 | 0.004755 | 0.004991 (1024, 512) | 0 | 0.080592 | 0.017348 (1024, 512) | 1 | 0.006391 | 0.006452 (1024, 1024) | 0 | 0.205667 | 0.040287 (1024, 1024) | 1 | 0.013286 | 0.013144 (1024, 4096) | 0 | 1.159914 | 0.267409 (1024, 4096) | 1 | 0.174798 | 0.153152 (4096, 128) | 0 | 0.081543 | 0.017331 (4096, 128) | 1 | 0.006936 | 0.006952 (4096, 512) | 0 | 0.575121 | 0.079814 (4096, 512) | 1 | 0.084379 | 0.040853 (4096, 1024) | 0 | 1.244555 | 0.251577 (4096, 1024) | 1 | 0.1782 | 0.154799 (4096, 4096) | 0 | 5.169306 | 1.180926 (4096, 4096) | 1 | 0.766602 | 0.740192 (32, 128, 128) | 0 | 0.080957 | 0.017508 (32, 128, 128) | 1 | 0.00692 | 0.006721 (32, 128, 128) | 2 | 0.006921 | 0.006859 (32, 128, 512) | 0 | 0.555404 | 0.081633 (32, 128, 512) | 1 | 0.077143 | 0.037545 (32, 128, 512) | 2 | 0.083525 | 0.041425 (32, 128, 1024) | 0 | 1.225558 | 0.255515 (32, 128, 1024) | 1 | 0.190202 | 0.154146 (32, 128, 1024) | 2 | 0.177495 | 0.1549 (32, 128, 4096) | 0 | 5.006225 | 1.090737 (32, 128, 4096) | 1 | 0.831286 | 0.759118 (32, 128, 4096) | 2 | 0.765793 | 0.742179 (32, 512, 128) | 0 | 0.560635 | 0.090112 (32, 512, 128) | 1 | 0.076585 | 0.042584 (32, 512, 128) | 2 | 0.095465 | 0.04338 (32, 512, 512) | 0 | 2.536246 | 0.541157 (32, 512, 512) | 1 | 0.397728 | 0.341854 (32, 512, 512) | 2 | 0.407399 | 0.35051 (32, 512, 1024) | 0 | 5.034069 | 1.092211 (32, 512, 1024) | 1 | 0.830025 | 0.760654 (32, 512, 1024) | 2 | 0.772979 | 0.740602 (32, 512, 4096) | 0 | 20.72267 | 4.655413 (32, 512, 4096) | 1 | 3.503717 | 3.075174 (32, 512, 4096) | 2 | 3.02452 | 3.002688 (32, 1024, 128) | 0 | 1.196986 | 0.24314 (32, 1024, 128) | 1 | 0.190801 | 0.15396 (32, 1024, 128) | 2 | 0.220551 | 0.154176 (32, 1024, 512) | 0 | 5.024947 | 1.09671 (32, 1024, 512) | 1 | 0.828758 | 0.768377 (32, 1024, 512) | 2 | 0.842505 | 0.748963 (32, 1024, 1024) | 0 | 10.38974 | 2.242758 (32, 1024, 1024) | 1 | 1.869875 | 1.547855 (32, 1024, 1024) | 2 | 1.538614 | 1.496964 (32, 1024, 4096) | 0 | 41.49604 | 9.043207 (32, 1024, 4096) | 1 | 7.476183 | 6.120244 (32, 1024, 4096) | 2 | 6.035282 | 6.005883 (32, 4096, 128) | 0 | 5.003981 | 1.093519 (32, 4096, 128) | 1 | 0.826404 | 0.769504 (32, 4096, 128) | 2 | 0.926172 | 0.751195 (32, 4096, 512) | 0 | 20.09485 | 4.335339 (32, 4096, 512) | 1 | 3.502722 | 3.074266 (32, 4096, 512) | 2 | 3.359437 | 3.006657 (32, 4096, 1024) | 0 | 40.23293 | 8.423752 (32, 4096, 1024) | 1 | 7.462769 | 6.156365 (32, 4096, 1024) | 2 | 6.141823 | 6.012172 (32, 4096, 4096) | 0 | 154.2752 | 35.87757 (32, 4096, 4096) | 1 | 28.58106 | 24.12571 (32, 4096, 4096) | 2 | 24.46488 | 23.94641 </body> </html> ``` import mxnet import mxnet.gluon.nn as nn import mxnet.numpy as np import time class TestStack(nn.HybridBlock): def __init__(self, axis=None): super(TestStack, self).__init__() self._axis = axis def forward(self, a, *args): return np.stack([a] + list(args), axis=self._axis) dims = [128, 512, 1024, 4096] print("shape;axis;time") for ndim in range (2): for dim1 in dims: for dim2 in dims: shape = (dim1, dim2) if ndim == 0 else (32, dim1, dim2) a = np.random.uniform(-1.0, 1.0, shape).astype(np.float32) b = np.random.uniform(-1.0, 1.0, shape).astype(np.float32) c = np.random.uniform(-1.0, 1.0, shape).astype(np.float32) d = np.random.uniform(-1.0, 1.0, shape).astype(np.float32) for axis in range(2 + ndim): stack = TestStack(axis) stack.hybridize() tic = time.time() for i in range(100): out = np.stack([a, b, c, d], axis=axis) out.wait_to_read() toc = time.time() print(f"{shape};{axis};{toc-tic}") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
