[ 
https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650594#comment-15650594
 ] 

ASF GitHub Bot commented on FLINK-4613:
---------------------------------------

Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2542
  
    Thanks @jfeher for the measurements! :)
    
    @thvasilo The filtering referred to having distinct (user,artist) pairs. 
It's only because the input of the iALS is a sparse matrix, and it would not 
make much sense to have more than one values for the same element of the 
matrix. E.g. to aggregate multiple listenings for the same (user,artist) pair, 
one could count them, and use the count as the implicit rating. We simply used 
the value 1.0 for every user-artist pair, but the algorithm works with any 
(positive) values, not only binary interactions.
    
    We've only measured Flink against itself, as the main ALS algorithm is 
already in Flink. It would be interesting to measure against Spark and other 
solutions, but that might not reflect the performance of our iALS extension, 
but rather the performance of ALS itself. That seems to be another issue for 
me. Do I see this right?


> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> The Alternating Least Squares implementation should be extended to handle 
> _implicit feedback_ datasets. These datasets do not contain explicit ratings 
> by users, they are rather built by collecting user behavior (e.g. user 
> listened to artist X for Y minutes), and they require a slightly different 
> optimization objective. See details by [Hu et 
> al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS 
> implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala],
>  which could be a basis for this extension. Only the updating factor part is 
> modified, and most of the changes are in the local parts of the algorithm 
> (i.e. UDFs). In fact, the only modification that is not local, is 
> precomputing a matrix product Y^T * Y and broadcasting it to all the nodes, 
> which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to