GitHub user gaborhermann opened a pull request:

    https://github.com/apache/flink/pull/2838

    [FLINK-4712] [FLINK-4713] [ml] Ranking recommendation & evaluation (WIP)

    Please note that this is a work-in-progress PR for discussing API design 
decisions. We propose here a class hierarchy for fitting the ranking 
evaluations into the proposed evaluation framework (see 
[PR](https://github.com/apache/flink/pull/1849)).
    The features are mostly working, but documentation is missing and minor 
refactoring is needed. The evaluations currently work with top 100 rankings 
(burnt-in), and we still need to fix that. We need feedback for two main 
solutions, so we can go on with the PR. Thanks for any comment!
    
    ### `RankingPredictor`
    
    We have managed to rework the evaluation framework proposed by @thvasilo, 
so that ranking predictions would fit in. Our approach is to use separate 
`RankingPredictor` and `Predictor` traits. One main problem however remains: 
there is no common superclass for `RankingPredictor` and `Predictor` so the 
pipelining mechanism might not work. A `Predictor` can only be at the and of 
the pipeline, so this should not really be a problem, but I do not know for 
sure. An alternative solution would be to have different objects `ALS` and 
`RankingALS` that give different predictions, but both extends only a 
`Predictor`. There could be implicit conversions between the two. I would 
prefer the current solution if it does not break the pipelining. @thvasilo what 
do you think about this?
    
    (This seems to be a problem similar to having a `predict_proba` function in 
scikit learn classification models, where the same model for the same input 
gives two different predictions: a `predict` for discrete predictions and 
`predict_proba` for giving a probability.)
    
    ### Generalizing `EvalutateDataSetOperation`
    
    On the other hand, we seem to have solved the scoring issue. The users can 
evaluate a recommendation algorithm such as ALS by using a score operating on 
rankings (e.g. nDCG), or a score operating on ratings (e.g. RMSE). They only 
need to modify the `Score` they use in their code, and nothing else.
    
    The main problem was that the evaluate method and 
`EvaluateDataSetOperation` were not general enough. They prepare the evaluation 
to `(trueValue, predictedValue)` pairs (i.e. a `DataSet[(PredictionType, 
PredictionType)]`), while ranking evaluations needed a more general input with 
the true ratings (`DataSet[(Int,Int,Double)]`) and the predicted rankings 
(`DataSet[(Int,Int,Int)]`).
    
    Instead of using `EvaluateDataSetOperation` we use a more general 
`PrepareOperation`. We rename the `Score` in the original evaluation framework 
to `PairwiseScore`. `RankingScore` and `PairwiseScore` has a common trait 
`Score`. This way the user can use both a `RankingScore` and a `PairwiseScore` 
for a certain model, and only need to alter the score used in the code.
    
    In case of pairwise scores (that only need true and predicted value pairs 
for evaluation) `EvaluateDataSetOperation` is used as a `PrepareOperation`. It 
prepares the evaluation by creating `(trueValue, predicitedValue)` pairs from 
the test dataset. Thus, the result of preparing and the input of 
`PairwiseScore`s will be `DataSet[(PredictionType,PredictionType)]`. In case of 
rankings the `PrepareOperation` passes the test dataset and creates the 
rankings. The result of preparing and the input of `RankingScore`s will be 
`(DataSet[Int,Int,Double], DataSet[Int,Int,Int])`. I believe this is a fairly 
acceptable solution that avoids breaking the API.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborhermann/flink ranking-rec-eval

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2838
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to