[
https://issues.apache.org/jira/browse/SPARK-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376046#comment-14376046
]
Debasish Das commented on SPARK-3735:
-------------------------------------
We might want to consider doing some of these things through indexed RDD
exposed through an API...right now ALS is completely join based...can we do
something nicer if we have access to an efficient read only cache from ALS
mapPartitions...Idea here is to think about zeros explicitly and not adding the
implicit heuristic which is generally hard to tune...
> Sending the factor directly or AtA based on the cost in ALS
> -----------------------------------------------------------
>
> Key: SPARK-3735
> URL: https://issues.apache.org/jira/browse/SPARK-3735
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
>
> It is common to have some super popular products in the dataset. In this
> case, sending many user factors to the target product block could be more
> expensive than sending the normal equation `\sum_i u_i u_i^T` and `\sum_i u_i
> r_ij` to the product block. The cost of sending a single factor is `k`, while
> the cost of sending a normal equation is much more expensive, `k * (k + 3) /
> 2`. However, if we use normal equation for all products associated with a
> user, we don't need to send this user factor.
> Determining the optimal assignment is hard. But we could use a simple
> heuristic. Inside any rating block,
> 1) order the product ids by the number of user ids associated with them in
> desc order
> 2) starting from the most popular product, mark popular products as "use
> normal eq" and calculate the cost
> Remember the best assignment that comes with the lowest cost and use it for
> computation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]