ChaiBapchya commented on issue #17980:
URL: 
https://github.com/apache/incubator-mxnet/issues/17980#issuecomment-623803913


   > In case somebody finds this issue and wants their optimized build, here is 
a different workaround that removes the need for `LD_PRELOAD`. Just do this 
before running cmake the first time:
   > 
   > ```shell
   > export CXXFLAGS="${CXXFLAGS} -DUSE_MKL -I/opt/intel/mkl/include"
   > ```
   > 
   > Then `cmake` can be run normally:
   > 
   > ```shell
   > cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release ..
   > ```
   > 
   > and the compiled MXNet can be run normally without any special environment 
variables.
   
   @kpuatamazon Hi I was trying to benchmark using opperf for mkl [default] vs 
workaround
   And despite ensuring mkl is installed & using export CXXFlags followed by 
usual cmake command, build failed with 
   ```
   gemm.cpp:(.text+0xb45): undefined reference to `cblas_gemm_s8u8s32'
   ```
   
   I tried the undocumented abominable kludge option you mentioned and that 
worked smoothly.
   ```
   export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so
   rm -rf build/
   mkdir -p build && cd build
   cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -D_DNNL_USE_MKL=FULL 
-DMKLINC=/opt/intel/mkl/include ..
   cmake --build . --parallel 1024
   ```
   
   Script for OpPerf : 
https://gist.github.com/ChaiBapchya/5f2342f75ddeb1e21f14acac665c76ad
   
   Results
   | Operator   | LHS                   | RHS                   | MKL Default   
| MKL Workaround        |
   |----------- |----------------       |----------------       |-------------  
|----------------       |
   | Dot        | (4, 512, 512)         | (4, 512, 512)         | 15.1122       
| 4.1254                |
   |            | (5, 512, 512)         | (5, 512, 512)         | 38.1678       
| 7.5323                |
   |            | (5, 512, 1536         | (5, 512, 1536)        | 21.6601       
| 19.2503               |
   |            | (5, 512, 2048)        | (5, 512, 2048)        | 29.0369       
| 23.7432               |
   |            | (5, 2048, 512)        | (5, 2048, 512)        | 167.5528      
| 129.9957              |
   |            |                       |                       |               
|                       |
   | Batch_dot  | (4, 512, 512)         | (4, 512, 512)         | 1.7898        
| 1.5445                |
   |            | (5, 512, 512)         | (5, 512, 512)         | 2.2457        
| 1.9361                |
   |            | (5, 512, 1536)        | (5, 512, 1536)        | 6.1453        
| 5.4034                |
   |            | (5, 512, 2048)        | (5, 512, 2048)        | 8.246         
| 8.0442                |
   |            | (5, 2048, 512)        | (5, 2048, 512)        | 160.6243      
| 29.0772               |
   |            |                       |                       |               
|                       |
   | FC         | (4, 512)              | (512, 512)            | 0.0609        
| 0.068                 |
   |            | (5, 512)              | (512, 512)            | 0.0633        
| 0.0731                |
   |            | (5, 512)              | (1536, 512)           | 0.0916        
| 0.0996                |
   |            | (5, 512)              | (2048, 512)           | 0.1081        
| 0.1084                |
   |            |                       |                       |               
|                       |
   
   However @kpuatamazon when I try to test out with default by unsetting the 
environment variable LD_PRELOAD, it failed to build default with 
`gemm.cpp:(.text+0xe6b): undefined reference to `cblas_gemm_s8u8s32'`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to