[
https://issues.apache.org/jira/browse/SPARK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14221379#comment-14221379
]
Debasish Das commented on SPARK-3066:
-------------------------------------
I did experiments on MovieLens dataset with varying rank on my localhost spark
with 4 GB RAM and 4 cores to see how much MAP improvement we see as the rank is
scaled
===
rank=10 (default)
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 799747, test: 200462.
Test RMSE = 0.8528377625407709.
Test users 6036 MAP 0.03851426277536059
Runtime: 30s
===
rank=25
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800417, test: 199792.
Test RMSE = 0.8518001349769724.
Test users 6037 MAP 0.04508057348514959
Runtime: 30 s
===
rank=50
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800823, test: 199386.
Test RMSE = 0.8487416471685229.
Test users 6038 MAP 0.05145126538369158
Runtime 42s
===
rank=100
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800720, test: 199489.
Test RMSE = 0.8508095863317275.
Test users 6033 MAP 0.0561225429735388
Runtime 1.5m
===
rank=150
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800257, test: 199952.
Test RMSE = 0.8435902056186158.
Test users 6035 MAP 0.05855252471878818
Runtime 3.6 m
===
rank=200
Got 1000209 ratings from 6040 users on 3706 movies.
Training: 800356, test: 199853.
Test RMSE = 0.8452385688272382.
Test users 6037 MAP 0.059176892052172934
Runtime 7.4 mins
I will run through MovieLens10m and Netflix dataset and generate the numbers of
them with varying ranks as well. I need to run them on cluster.
> Support recommendAll in matrix factorization model
> --------------------------------------------------
>
> Key: SPARK-3066
> URL: https://issues.apache.org/jira/browse/SPARK-3066
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Xiangrui Meng
>
> ALS returns a matrix factorization model, which we can use to predict ratings
> for individual queries as well as small batches. In practice, users may want
> to compute top-k recommendations offline for all users. It is very expensive
> but a common problem. We can do some optimization like
> 1) collect one side (either user or product) and broadcast it as a matrix
> 2) use level-3 BLAS to compute inner products
> 3) use Utils.takeOrdered to find top-k
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]