Peng Meng created SPARK-20446:
---------------------------------

             Summary: Optimize the process of MLLIB ALS recommendForAll
                 Key: SPARK-20446
                 URL: https://issues.apache.org/jira/browse/SPARK-20446
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
    Affects Versions: 2.3.0
            Reporter: Peng Meng


The recommendForAll of MLLIB ALS is very slow.
GC is a key problem of the current method. 
The task use the following code to keep temp result:
val output = new Array[(Int, (Int, Double))](m*n)
m = n = 4096 (default value, no method to set)
so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and 
cause serious GC problem, and it is frequently OOM.

Actually, we don't need to save all the temp result. Support we recommend topK 
(topK is about 10, or 20) product for each user, we only need  4k * topK * (4 + 
4 + 8) memory to save the temp result.

 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to