Peng Meng created SPARK-20446:
---------------------------------
Summary: Optimize the process of MLLIB ALS recommendForAll
Key: SPARK-20446
URL: https://issues.apache.org/jira/browse/SPARK-20446
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Affects Versions: 2.3.0
Reporter: Peng Meng
The recommendForAll of MLLIB ALS is very slow.
GC is a key problem of the current method.
The task use the following code to keep temp result:
val output = new Array[(Int, (Int, Double))](m*n)
m = n = 4096 (default value, no method to set)
so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and
cause serious GC problem, and it is frequently OOM.
Actually, we don't need to save all the temp result. Support we recommend topK
(topK is about 10, or 20) product for each user, we only need 4k * topK * (4 +
4 + 8) memory to save the temp result.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]