[ 
https://issues.apache.org/jira/browse/SPARK-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng closed SPARK-14938.
--------------------------------
    Resolution: Not A Problem

> Use Datasets.as to improve internal implementation
> --------------------------------------------------
>
>                 Key: SPARK-14938
>                 URL: https://issues.apache.org/jira/browse/SPARK-14938
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: zhengruifeng
>
> As discussed in [https://github.com/apache/spark/pull/11915], we can use 
> {{Dataset.as}} API instead of RDD operations.
> From:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w).rdd.map {
>     case Row(label: Double, feature: Double, weight: Double) =>
>         (label, feature, weight)
> }
> {code}
> To:
> {code}
> dataset.select(col($(labelCol)).cast(DoubleType), f, w)
>     .as[(Double, Double, Double)].rdd
> {code}
> From:
> {code}
> dataset.select(col($(featuresCol)), col($(labelCol)).cast(DoubleType), 
> col($(censorCol)))
>     .rdd.map {
>         case Row(features: Vector, label: Double, censor: Double) =>
>             AFTPoint(features, label, censor)
>        }
> {code}
> To:
> {code}
> val sqlContext = dataset.sqlContext
> import sqlContext.implicits._
> dataset.select(col($(featuresCol)).as("features"),
>     col($(labelCol)).cast(DoubleType).as("label"),
>     col($(censorCol)).as("censor")).as[AFTPoint].rdd
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to