[ 
https://issues.apache.org/jira/browse/SPARK-20446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981148#comment-15981148
 ] 

Peng Meng edited comment on SPARK-20446 at 4/24/17 4:18 PM:
------------------------------------------------------------

Thanks [~mlnick], I also compared DataFrame Version ALS recommendForAll, but no 
big performance improvement found. 
In our solution, 
1, We use BLAS 2 to replace BLAS 3, which reduce much of GC caused by BLAS 
computation.
2. We use output = Array[(Int, (Int, Double))](4096*topK) to replace output = 
Array[(Int, (Int, Double))](4096*4096), which largely reduce the memory 
allocation. 
3. We use priorityQueue for the Sort, which improve about 40% compared with 
general Sort.

In our experiment with different configuration (different number of machines, 
different number of cores of each machine), this solution is about 5x-9x 
improvement compared with the current method when the blockSize is 4096. 

There is no OOM for the this solution, and the performance is about the same 
with different blockSize.  For the old method, the performance is highly 
related with blockSize.
cc [~mengxr]





was (Author: [email protected]):
Thanks [~mlnick], I also compared DataFrame Version ALS recommendForAll, but no 
big performance improvement found. 
In our solution, 
1, We use BLAS 2 to replace BLAS 3, which reduce much of GC caused by BLAS 
computation.
2. We use output = Array[(Int, (Int, Double))](4096*topK) to replace output = 
Array[(Int, (Int, Double))](4096*4096), which largely reduce the memory 
allocation. 
3. We use priorityQueue for the Sort, which improve about 40% compared with 
general Sort.

cc [~mengxr]




> Optimize the process of MLLIB ALS recommendForAll
> -------------------------------------------------
>
>                 Key: SPARK-20446
>                 URL: https://issues.apache.org/jira/browse/SPARK-20446
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>
> The recommendForAll of MLLIB ALS is very slow.
> GC is a key problem of the current method. 
> The task use the following code to keep temp result:
> val output = new Array[(Int, (Int, Double))](m*n)
> m = n = 4096 (default value, no method to set)
> so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and 
> cause serious GC problem, and it is frequently OOM.
> Actually, we don't need to save all the temp result. Suppose we recommend 
> topK (topK is about 10, or 20) product for each user, we only need  4k * topK 
> * (4 + 4 + 8) memory to save the temp result.
> I have written a solution for this method with the following test result. 
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000
> BlockSize: 1024 2048 4096 8192
> Old method: 245s 332s 488s OOM
> This solution: 121s 118s 117s 120s
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to