[jira] [Commented] (SPARK-15027) ALS.train should use DataFrame instead of RDD

Xiangrui Meng (JIRA) Sat, 30 Apr 2016 01:05:15 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265238#comment-15265238
 ]


Xiangrui Meng commented on SPARK-15027:
---------------------------------------

It might be tricky to use Dataset due to encoders and generic ID types. But if 
we use DataFrame as input and output, it seems feasible. It would be great if 
you can take a look.

> ALS.train should use DataFrame instead of RDD
> ---------------------------------------------
>
>                 Key: SPARK-15027
>                 URL: https://issues.apache.org/jira/browse/SPARK-15027
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, PySpark
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> We should also update `ALS.train` to use `Dataset/DataFrame` instead of `RDD` 
> to be consistent with other APIs under spark.ml and it also leaves space for 
> Tungsten-based optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15027) ALS.train should use DataFrame instead of RDD

Reply via email to