[ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826078#comment-15826078 ]
Nick Pentreath commented on SPARK-14409: ---------------------------------------- [~danilo.ascione] [~roberto.mirizzi] thanks for the code examples. Both seem reasonable and I like the DataFrame-based solutions here. The ideal solution would likely take a few elements from each design. One aspect that concerns me is how are you generating recommendations from ALS? It appears that you will be using the current output of {{ALS.transform}}. So you're computing a ranking metric in a scenario where you only recommend the subset of user-item combinations that occur in the evaluation data set. So it is sort of like a "re-ranking" evaluation metric in a sense. I'd expect the ranking metric here to quite dramatically overestimate true performance, since in the real word you would generate recommendations from the complete set of available items. cc [~srowen] thoughts? > Investigate adding a RankingEvaluator to ML > ------------------------------------------- > > Key: SPARK-14409 > URL: https://issues.apache.org/jira/browse/SPARK-14409 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Nick Pentreath > Priority: Minor > > {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no > {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful > for recommendation evaluation (and can be useful in other settings > potentially). > Should be thought about in conjunction with adding the "recommendAll" methods > in SPARK-13857, so that top-k ranking metrics can be used in cross-validators. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org