apeforest commented on issue #16735: Use single-bit for mask in dropout operator URL: https://github.com/apache/incubator-mxnet/pull/16735#issuecomment-584304458 @eric-haibin-lin The big slow down was due to the use of dynamic loop scheduling in omp when assigning a chunk of 8 to each thread. After I changed it to static scheduling, the performance is improved, but still slower than master due to the extra bit operations needed in the kernel and the fact that we need to assign a batch of 8 when parallelizing the for loop. @TaoLv Any other suggestion to speed up the omp section? Thanks. This PR: ``` [{'Dropout': [{'avg_time_Dropout': 3.4434730419889092, 'p50_time_Dropout': 3.2913365866988897, 'p90_time_Dropout': 4.017965029925109, 'p99_time_Dropout': 4.7498174966312945, 'inputs': {'data': (1024, 1024)}}]}] ``` Master: ``` [{'Dropout': [{'avg_time_Dropout': 2.1684831893071532, 'p50_time_Dropout': 1.962667447514832, 'p90_time_Dropout': 2.8079885290935636, 'p99_time_Dropout': 2.967591730412097, 'inputs': {'data': (1024, 1024)}}]}] ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
