Xiangrui Meng created SPARK-3735:
------------------------------------
Summary: Sending the factor directly or AtA based on the cost in
ALS
Key: SPARK-3735
URL: https://issues.apache.org/jira/browse/SPARK-3735
Project: Spark
Issue Type: Improvement
Components: ML, MLlib
Reporter: Xiangrui Meng
It is common to have some super popular products in the dataset. In this case,
sending many user factors to the target product block could be more expensive
than sending the normal equation `\sum_i u_i u_i^T` and `\sum_i u_i r_ij` to
the product block. The cost of sending a single factor is `k`, while the cost
of sending a normal equation is much more expensive, `k * (k + 3) / 2`.
However, if we use normal equation for all products associated with a user, we
don't need to send this user factor.
Determining the optimal assignment is hard. But we could use a simple
heuristic. Inside any rating block,
1) order the product ids by the number of user ids associated with them in desc
order
2) starting from the most popular product, mark popular products as "use normal
eq" and calculate the cost
Remember the best assignment that comes with the lowest cost and use it for
computation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]