[ https://issues.apache.org/jira/browse/SPARK-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng closed SPARK-14938. -------------------------------- Resolution: Not A Problem > Use Datasets.as to improve internal implementation > -------------------------------------------------- > > Key: SPARK-14938 > URL: https://issues.apache.org/jira/browse/SPARK-14938 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: zhengruifeng > > As discussed in [https://github.com/apache/spark/pull/11915], we can use > {{Dataset.as}} API instead of RDD operations. > From: > {code} > dataset.select(col($(labelCol)).cast(DoubleType), f, w).rdd.map { > case Row(label: Double, feature: Double, weight: Double) => > (label, feature, weight) > } > {code} > To: > {code} > dataset.select(col($(labelCol)).cast(DoubleType), f, w) > .as[(Double, Double, Double)].rdd > {code} > From: > {code} > dataset.select(col($(featuresCol)), col($(labelCol)).cast(DoubleType), > col($(censorCol))) > .rdd.map { > case Row(features: Vector, label: Double, censor: Double) => > AFTPoint(features, label, censor) > } > {code} > To: > {code} > val sqlContext = dataset.sqlContext > import sqlContext.implicits._ > dataset.select(col($(featuresCol)).as("features"), > col($(labelCol)).cast(DoubleType).as("label"), > col($(censorCol)).as("censor")).as[AFTPoint].rdd > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org