[
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822021#comment-15822021
]
Danilo Ascione edited comment on SPARK-13857 at 1/17/17 4:12 PM:
-----------------------------------------------------------------
I have a pipeline similar to [~abudd2014]'s one. I have implemented a dataframe
api based RankingEvaluator that takes care of getting the top K recommendations
at the evaluation phase of the pipeline, and it can be used in model selection
pipeline (Cross-Validation).
Sample usage code:
{code}
val als = new ALS() //input dataframe (userId, itemId, clicked)
.setUserCol("userId")
.setItemCol("itemId")
.setRatingCol("clicked")
.setImplicitPrefs(true)
val paramGrid = new ParamGridBuilder()
.addGrid(als.regParam, Array(0.01,0.1))
.addGrid(als.alpha, Array(40.0, 1.0))
.build()
val evaluator = new RankingEvaluator()
.setMetricName("mpr") //Mean Percentile Rank
.setLabelCol("itemId")
.setPredictionCol("prediction")
.setQueryCol("userId")
.setK(5) //Top K
val cv = new CrossValidator()
.setEstimator(als)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
val crossValidatorModel = cv.fit(inputDF)
// Print the average metrics per ParamGrid entry
val avgMetricsParamGrid = crossValidatorModel.avgMetrics
// Combine with paramGrid to see how they affect the overall metrics
val combined = paramGrid.zip(avgMetricsParamGrid)
{code}
Then the resulting "bestModel" from cross validation model is used to generate
the top K recommendations in batches.
RankingEvaluator code is here
[https://github.com/apache/spark/pull/16618/files#diff-0345c4cb1878d3bb0d84297202fdc95f]
I would appreciate any feedback. Thanks!
was (Author: danilo.ascione):
I have a pipeline similar to [~abudd2014]'s one. I have implemented a dataframe
api based RankingEvaluator that takes care of getting the top K recommendations
at the evaluation phase of the pipeline, and it can be used in model selection
pipeline (Cross-Validation).
Sample usage code:
{code}
val als = new ALS() //input dataframe (userId, itemId, clicked)
.setUserCol("userId")
.setItemCol("itemId")
.setRatingCol("clicked")
.setImplicitPrefs(true)
val paramGrid = new ParamGridBuilder()
.addGrid(als.regParam, Array(0.01,0.1))
.addGrid(als.alpha, Array(40.0, 1.0))
.build()
val evaluator = new RankingEvaluator()
.setMetricName("mpr") //Mean Percentile Rank
.setLabelCol("itemId")
.setPredictionCol("prediction")
.setQueryCol("userId")
.setK(5) //Top K
val cv = new CrossValidator()
.setEstimator(als)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
val crossValidatorModel = cv.fit(inputDF)
// Print the average metrics per ParamGrid entry
val avgMetricsParamGrid = crossValidatorModel.avgMetrics
// Combine with paramGrid to see how they affect the overall metrics
val combined = paramGrid.zip(avgMetricsParamGrid)
{code}
Then the resulting "bestModel" from cross validation model is used to generate
the top K recommendations in batches.
RankingEvaluator code is here
[https://github.com/daniloascione/spark/commit/c93ab86d35984e9f70a3b4f543fb88f5541333f0]
I would appreciate any feedback. Thanks!
> Feature parity for ALS ML with MLLIB
> ------------------------------------
>
> Key: SPARK-13857
> URL: https://issues.apache.org/jira/browse/SPARK-13857
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Nick Pentreath
> Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods
> {{recommendProducts/recommendUsers}} for recommending top K to a given user /
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML.
> Investigate if efficiency can be improved at the same time (see SPARK-11968).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]