[GitHub] [incubator-mxnet] ciyongch commented on issue #15429: Operator Performance Regression on CPU

GitBox Tue, 02 Jul 2019 07:24:44 -0700

ciyongch commented on issue #15429: Operator Performance Regression on CPU
URL: 
https://github.com/apache/incubator-mxnet/issues/15429#issuecomment-507700562
 
 
   @roywei I've collected some performance data of  `Dropout`, `relu`, and 
`dot` on C5.18xlarge, and saw  some big run to run variance on `dot` and 
`Dropout` as below table without binding cores as table 1.
   After binding cores to a socket via below command before running 
benchmarking, I got a stable and better results as table 2.
   ```
   export KMP_AFFINITY=granularity=fine,noduplicates,compact,1,0
   export OMP_NUM_THREADS=18
   ```
   I ran 5 times with 10 warmup + 100 runs for each time. Not sure if you're 
setting the env variables when running the benchmark. And I suggest to re-run 
the ops for several times to see if it's a real degradation.
   
   As for `relu` operator got worse perf on v1.5,  I found v1.5 included the 
commit of c64559178 (#14262), which added "IsNan(a)" check for data which 
introduced some overhead. Will add more data and more operator results later.
   
   **Table1**
   
   | Operator | Avg Forward Time (ms) 1.4.1 | Avg Forward Time (ms) 1.5.0 | 
Regression | data shape                                                         
              |
   
|----------|-----------------------------|-----------------------------|------------|----------------------------------------------------------------------------------|
   | dot      | 0.1791                      | 0.1576                      | 
12.0%      | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.2505                      | 0.1146                      | 
54.3%      | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.1798                      | 0.2531                      | 
-40.8%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.1529                      | 0.2786                      | 
-82.2%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.4024                      | 0.0763                      | 
81.0%      | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.1824                      | 0.3437                      | 
-88.4%     | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.1107                      | 0.2491                      | 
-125.0%    | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.4099                      | 0.2401                      | 
41.4%      | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 1.4348                      | 0.2439                      | 
83.0%      | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.3124                      | 0.2922                      | 
6.5%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | Dropout  | 0.0551                      | 0.0477                      | 
13.4%      | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.0446                      | 0.0462                      | 
-3.6%      | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.078                       | 0.0567                      | 
27.3%      | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.0311                      | 0.0797                      | 
-156.3%    | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.3707                      | 0.0671                      | 
81.9%      | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   
   **Table 2**
   
   | Operator | Avg Forward Time (ms) 1.4.1 | Avg Forward Time (ms) 1.5.0 | 
Regression | data shape                                                         
              |
   
|----------|-----------------------------|-----------------------------|------------|----------------------------------------------------------------------------------|
   | dot      | 0.0079                      | 0.0087                      | 
-10.1%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.0076                      | 0.0089                      | 
-17.1%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.0079                      | 0.009                       | 
-13.9%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.008                       | 0.0093                      | 
-16.3%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.0078                      | 0.0089                      | 
-14.1%     | {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 
'transpose_a': True} |
   | dot      | 0.0946                      | 0.0895                      | 
5.4%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.0937                      | 0.0889                      | 
5.1%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.093                       | 0.0895                      | 
3.8%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.097                       | 0.0898                      | 
7.4%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | dot      | 0.0908                      | 0.089                       | 
2.0%       | {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}        
              |
   | Dropout  | 0.0238                      | 0.0235                      | 
1.3%       | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.0244                      | 0.0234                      | 
4.1%       | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.024                       | 0.0234                      | 
2.5%       | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.024                       | 0.0235                      | 
2.1%       | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |
   | Dropout  | 0.022                       | 0.0234                      | 
-6.4%      | {'mode': 'always', 'data': (10000, 10), 'p': 0.5}                  
              |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ciyongch commented on issue #15429: Operator Performance Regression on CPU

Reply via email to