Peng Meng created SPARK-21305:
---------------------------------

             Summary: The BKM (best known methods) of using native BLAS to 
improvement ML/MLLIB performance
                 Key: SPARK-21305
                 URL: https://issues.apache.org/jira/browse/SPARK-21305
             Project: Spark
          Issue Type: Umbrella
          Components: ML, MLlib
    Affects Versions: 2.3.0
            Reporter: Peng Meng
            Priority: Critical


Many ML/MLLIB algorithms use native BLAS (like Intel MKL, ATLAS, OpenBLAS) to 
improvement the performance. 
The methods to use native BLAS is important for the performance,  sometimes 
(high opportunity) native BLAS even causes worse performance.  
For example, for the ALS recommendForAll method before SPARK 2.2 which uses 
BLAS gemm for matrix multiplication. 
If you only test the matrix multiplication performance of native BLAS gemm 
(like Intel MKL, and OpenBLAS) and netlib-java F2j BLAS gemm , the native BLAS 
is about 10X performance improvement.  But if you test the Spark Job end-to-end 
performance, F2j is much faster than native BLAS, very interesting. 

I spend much time for this problem, and find we should not use native BLAS 
(like OpenBLAS and Intel MKL) which support multi-thread with no any setting. 
By default, this native BLAS will enable multi-thread, which will conflict with 
Spark executor.  You can use multi-thread native BLAS, but it is better to 
disable multi-thread first. 

https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications

I think we should add some comments in docs/ml-guilde.md for this first. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to