[ 
https://issues.apache.org/jira/browse/FLINK-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693381#comment-15693381
 ] 

ASF GitHub Bot commented on FLINK-4712:
---------------------------------------

Github user gaborhermann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2838#discussion_r89503899
  
    --- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
 ---
    @@ -72,14 +77,142 @@ trait Predictor[Self] extends Estimator[Self] with 
WithParameters {
         */
       def evaluate[Testing, PredictionValue](
           testing: DataSet[Testing],
    -      evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
    -      evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
    +      evaluateParameters: ParameterMap = ParameterMap.Empty)
    +      (implicit evaluator: EvaluateDataSetOperation[Self, Testing, 
PredictionValue])
         : DataSet[(PredictionValue, PredictionValue)] = {
         FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment)
         evaluator.evaluateDataSet(this, evaluateParameters, testing)
       }
     }
     
    +trait RankingPredictor[Self] extends Estimator[Self] with WithParameters {
    +  that: Self =>
    +
    +  def predictRankings(
    +    k: Int,
    +    users: DataSet[Int],
    +    predictParameters: ParameterMap = ParameterMap.Empty)(implicit
    +    rankingPredictOperation : RankingPredictOperation[Self])
    +  : DataSet[(Int,Int,Int)] =
    +    rankingPredictOperation.predictRankings(this, k, users, 
predictParameters)
    +
    +  def evaluateRankings(
    +    testing: DataSet[(Int,Int,Double)],
    +    evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
    +    rankingPredictOperation : RankingPredictOperation[Self])
    +  : DataSet[(Int,Int,Int)] = {
    +    // todo: do not burn 100 topK into code
    +    predictRankings(100, testing.map(_._1).distinct(), evaluateParameters)
    +  }
    +}
    +
    +trait RankingPredictOperation[Instance] {
    +  def predictRankings(
    +    instance: Instance,
    +    k: Int,
    +    users: DataSet[Int],
    +    predictParameters: ParameterMap = ParameterMap.Empty)
    +  : DataSet[(Int, Int, Int)]
    +}
    +
    +/**
    +  * Trait for providing auxiliary data for ranking evaluations.
    +  *
    +  * They are useful e.g. for excluding items found in the training 
[[DataSet]]
    +  * from the recommended top K items.
    +  */
    +trait TrainingRatingsProvider {
    +
    +  def getTrainingData: DataSet[(Int, Int, Double)]
    +
    +  /**
    +    * Retrieving the training items.
    +    * Although this can be calculated from the training data, it requires 
a costly
    +    * [[DataSet.distinct]] operation, while in matrix factor models the 
set items could be
    +    * given more efficiently from the item factors.
    +    */
    +  def getTrainingItems: DataSet[Int] = {
    +    getTrainingData.map(_._2).distinct()
    +  }
    +}
    +
    +/**
    +  * Ranking predictions for the most common case.
    +  * If we can predict ratings, we can compute top K lists by sorting the 
predicted ratings.
    +  */
    +class RankingFromRatingPredictOperation[Instance <: 
TrainingRatingsProvider]
    +(val ratingPredictor: PredictDataSetOperation[Instance, (Int, Int), (Int, 
Int, Double)])
    +  extends RankingPredictOperation[Instance] {
    +
    +  private def getUserItemPairs(users: DataSet[Int], items: DataSet[Int], 
exclude: DataSet[(Int, Int)])
    +  : DataSet[(Int, Int)] = {
    +    users.cross(items)
    --- End diff --
    
    I completely agree. There are use-cases where we would not like to give 
rankings from all the items. E.g. when recommending TV programs, we would only 
like to recommend currently running TV programs, but train on all of them.
    
    We'll include an `item` DataSet parameter to ranking predictions.
    
    (Btw. I believe the "Flink-way" is to let the user configure as much as 
possible, but that's just my opinion :) )


> Implementing ranking predictions for ALS
> ----------------------------------------
>
>                 Key: FLINK-4712
>                 URL: https://issues.apache.org/jira/browse/FLINK-4712
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Domokos Miklós Kelen
>            Assignee: Gábor Hermann
>
> We started working on implementing ranking predictions for recommender 
> systems. Ranking prediction means that beside predicting scores for user-item 
> pairs, the recommender system is able to recommend a top K list for the users.
> Details:
> In practice, this would mean finding the K items for a particular user with 
> the highest predicted rating. It should be possible also to specify whether 
> to exclude the already seen items from a particular user's toplist. (See for 
> example the 'exclude_known' setting of [Graphlab Create's ranking 
> factorization 
> recommender|https://turi.com/products/create/docs/generated/graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend.html#graphlab.recommender.ranking_factorization_recommender.RankingFactorizationRecommender.recommend]
>  ).
> The output of the topK recommendation function could be in the form of 
> {{DataSet[(Int,Int,Int)]}}, meaning (user, item, rank), similar to Graphlab 
> Create's output. However, this is arguable: follow up work includes 
> implementing ranking recommendation evaluation metrics (such as precision@k, 
> recall@k, ndcg@k), similar to [Spark's 
> implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems].
>  It would be beneficial if we were able to design the API such that it could 
> be included in the proposed evaluation framework (see 
> [5157|https://issues.apache.org/jira/browse/FLINK-2157]), which makes it 
> neccessary to consider the possible output type {{DataSet[(Int, 
> Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, 
> array of items), possibly including the predicted scores as well. See 
> [4713|https://issues.apache.org/jira/browse/FLINK-4713] for details.
> Another question arising is whether to provide this function as a member of 
> the ALS class, as a switch-kind of parameter to the ALS implementation 
> (meaning the model is either a rating or a ranking recommender model) or in 
> some other way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to