[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696250#comment-15696250
 ] 

ASF GitHub Bot commented on FLINK-4712:
---------------------------------------

Github user thvasilo commented on the issue:

    https://github.com/apache/flink/pull/2838
  
    > The problem is not with the evaluate(test: TestType): DataSet[Double] but 
rather with evaluate(test: TestType): DataSet[(Prediction,Prediction)].
    
    Completely agree there, I advocated for removing/renaming the evaluate 
function, we considered using a `score` function for a more sklearn-like 
approach before, see e.g. #902. Having _some_ function that returns a 
`DataSet[(truth: Prediction,pred: Prediction)]` is useful and probably 
necessary, but we should look at alternatives as the current state is confusing.
    I think I like the approach you are suggesting, so feel free to come up 
with an alternative in the WIP PRs.
    
    Getting rid of the Pipeline requirements for recommendation algorithms 
would simplify some things. In that case we'll have to re-evaluate if it makes 
sense for them to implement the `Predictor` interface at all, or maybe we have 
`ChainablePredictors` but I think our hierarchy is deep enough already.


> Implementing ranking predictions for ALS
> ----------------------------------------
>
>                 Key: FLINK-4712
>                 URL: https://issues.apache.org/jira/browse/FLINK-4712
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Domokos Miklós Kelen
>            Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to