[
https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236796#comment-15236796
]
Nick Pentreath commented on SPARK-13857:
----------------------------------------
[~mengxr] [~josephkb]
In an ideal world, this is what train-validation split with ALS would look like:
{code}
// Prepare training and test data.
val ratings = ...
val Array(training, test) = ratings.randomSplit(Array(0.8, 0.2))
// set up ALS with top-k prediction
val als = new ALS()
.setMaxIter(5)
.setImplicitPrefs(true)
.setK(10)
.setTopKInputCol("user")
.setTopKOutputCol("topk")
// build param grid
val paramGrid = new ParamGridBuilder()
.addGrid(als.regParam, Array(0.01, 0.05, 0.1))
.addGrid(als.alpha, Array(1.0, 10.0, 20.0))
.build()
// ranking evaluator with appropriate prediction column
val evaluator = new RankingEvaluator()
.setPredictionCol("topk")
.setMetricName("mapk")
.setK(10)
.setLabelCol("actual")
val trainValidationSplit = new TrainValidationSplit()
.setEstimator(als)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
// 80% of the data will be used for training and the remaining 20% for
validation.
.setTrainRatio(0.8)
// Run train validation split, and choose the best set of parameters.
val model = trainValidationSplit.fit(training)
// Make predictions on test data. model is the model with combination of
parameters
// that performed best.
model.transform(test)
.select("user", "actual", "topk")
.show()
{code}
> Feature parity for ALS ML with MLLIB
> ------------------------------------
>
> Key: SPARK-13857
> URL: https://issues.apache.org/jira/browse/SPARK-13857
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Nick Pentreath
> Assignee: Nick Pentreath
>
> Currently {{mllib.recommendation.MatrixFactorizationModel}} has methods
> {{recommendProducts/recommendUsers}} for recommending top K to a given user /
> item, as well as {{recommendProductsForUsers/recommendUsersForProducts}} to
> recommend top K across all users/items.
> Additionally, SPARK-10802 is for adding the ability to do
> {{recommendProductsForUsers}} for a subset of users (or vice versa).
> Look at exposing or porting (as appropriate) these methods to ALS in ML.
> Investigate if efficiency can be improved at the same time (see SPARK-11968).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]