Xiangrui Meng created SPARK-3735:
------------------------------------

             Summary: Sending the factor directly or AtA based on the cost in 
ALS
                 Key: SPARK-3735
                 URL: https://issues.apache.org/jira/browse/SPARK-3735
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Xiangrui Meng


It is common to have some super popular products in the dataset. In this case, 
sending many user factors to the target product block could be more expensive 
than sending the normal equation `\sum_i u_i u_i^T` and `\sum_i u_i r_ij` to 
the product block. The cost of sending a single factor is `k`, while the cost 
of sending a normal equation is much more expensive, `k * (k + 3) / 2`. 
However, if we use normal equation for all products associated with a user, we 
don't need to send this user factor.

Determining the optimal assignment is hard. But we could use a simple 
heuristic. Inside any rating block,

1) order the product ids by the number of user ids associated with them in desc 
order
2) starting from the most popular product, mark popular products as "use normal 
eq" and calculate the cost

Remember the best assignment that comes with the lowest cost and use it for 
computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to