ciyongch commented on issue #15429: Operator Performance Regression on CPU URL: https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-507700562 @roywei I've collected some performance data of `Dropout`, `relu`, and `dot` on C5.18xlarge, and saw some big run to run variance on `dot` and `Dropout` as below table without binding cores as table 1. After binding cores to a socket via below command before running benchmarking, I got a stable and better results as table 2. ``` export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0 export OMP_NUM_THREADS=18 ``` I ran 5 times with 10 warmup + 100 runs for each time. Not sure if you're setting the env variables when running the benchmark. And I suggest to re-run the ops for several times to see if it's a real degradation. As for `relu` operator got worse perf on v1.5, I found v1.5 included the commit of c64559178 (#14262), which added "IsNan(a)" check for data which introduced some overhead. Will add more data and more operator results later. **Table1** | Operator | Avg Forward Time (ms) 1.4.1 | Avg Forward Time (ms) 1.5.0 | Regression | data shape | |----------|-----------------------------|-----------------------------|------------|----------------------------------------------------------------------------------| | dot | 0.1791 | 0.1576 | 12.0% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.2505 | 0.1146 | 54.3% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.1798 | 0.2531 | -40.8% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.1529 | 0.2786 | -82.2% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.4024 | 0.0763 | 81.0% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.1824 | 0.3437 | -88.4% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.1107 | 0.2491 | -125.0% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.4099 | 0.2401 | 41.4% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 1.4348 | 0.2439 | 83.0% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.3124 | 0.2922 | 6.5% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | Dropout | 0.0551 | 0.0477 | 13.4% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.0446 | 0.0462 | -3.6% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.078 | 0.0567 | 27.3% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.0311 | 0.0797 | -156.3% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.3707 | 0.0671 | 81.9% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | **Table 2** | Operator | Avg Forward Time (ms) 1.4.1 | Avg Forward Time (ms) 1.5.0 | Regression | data shape | |----------|-----------------------------|-----------------------------|------------|----------------------------------------------------------------------------------| | dot | 0.0079 | 0.0087 | -10.1% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.0076 | 0.0089 | -17.1% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.0079 | 0.009 | -13.9% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.008 | 0.0093 | -16.3% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.0078 | 0.0089 | -14.1% | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} | | dot | 0.0946 | 0.0895 | 5.4% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.0937 | 0.0889 | 5.1% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.093 | 0.0895 | 3.8% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.097 | 0.0898 | 7.4% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | dot | 0.0908 | 0.089 | 2.0% | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} | | Dropout | 0.0238 | 0.0235 | 1.3% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.0244 | 0.0234 | 4.1% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.024 | 0.0234 | 2.5% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.024 | 0.0235 | 2.1% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} | | Dropout | 0.022 | 0.0234 | -6.4% | {'mode': 'always', 'data': (10000, 10), 'p': 0.5} |
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
