[
https://issues.apache.org/jira/browse/SPARK-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265238#comment-15265238
]
Xiangrui Meng commented on SPARK-15027:
---------------------------------------
It might be tricky to use Dataset due to encoders and generic ID types. But if
we use DataFrame as input and output, it seems feasible. It would be great if
you can take a look.
> ALS.train should use DataFrame instead of RDD
> ---------------------------------------------
>
> Key: SPARK-15027
> URL: https://issues.apache.org/jira/browse/SPARK-15027
> Project: Spark
> Issue Type: Improvement
> Components: ML, PySpark
> Affects Versions: 2.0.0
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
>
> We should also update `ALS.train` to use `Dataset/DataFrame` instead of `RDD`
> to be consistent with other APIs under spark.ml and it also leaves space for
> Tungsten-based optimization.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]