[ 
https://issues.apache.org/jira/browse/SPARK-20443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peng Meng updated SPARK-20443:
------------------------------
    Description: 
The blockSize of MLLIB ALS is very important for ALS performance. 
In our test, when the blockSize is 128, the performance is about 4X comparing 
with the blockSize is 4096 (default value).
The following are our test results: 
BlockSize(recommendationForAll time)
128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)

The Test Environment:
3 workers: each work 10 core, each work 30G memory, each work 1 executor.
The Data: User 480,000, and Item 17,000

  was:
The blockSize of MLLIB ALS is very important for ALS performance. 
In our test, when the blockSize is 128, the performance is about 4X comparing 
with the blockSize is 4096 (default value).
The following are our test results: 
BlockSize(recommendationForAll time)
128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)

The Test Environment:
3 workers: each work 10 core, each work 30G memory, each work 1 executor.
The Data: User 48W, and Item 1.7W


> The blockSize of MLLIB ALS should be setting  by the User
> ---------------------------------------------------------
>
>                 Key: SPARK-20443
>                 URL: https://issues.apache.org/jira/browse/SPARK-20443
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Priority: Minor
>
> The blockSize of MLLIB ALS is very important for ALS performance. 
> In our test, when the blockSize is 128, the performance is about 4X comparing 
> with the blockSize is 4096 (default value).
> The following are our test results: 
> BlockSize(recommendationForAll time)
> 128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 8192(OOM)
> The Test Environment:
> 3 workers: each work 10 core, each work 30G memory, each work 1 executor.
> The Data: User 480,000, and Item 17,000



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to