[ https://issues.apache.org/jira/browse/SPARK-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Armbrust updated SPARK-7160: ------------------------------------ Priority: Critical (was: Major) Target Version/s: 1.5.0 Shepherd: Michael Armbrust > Support converting DataFrames to typed RDDs. > -------------------------------------------- > > Key: SPARK-7160 > URL: https://issues.apache.org/jira/browse/SPARK-7160 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.3.1 > Reporter: Ray Ortigas > Assignee: Ray Ortigas > Priority: Critical > > As a Spark user still working with RDDs, I'd like the ability to convert a > DataFrame to a typed RDD. > For example, if I've converted RDDs to DataFrames so that I could save them > as Parquet or CSV files, I would like to rebuild the RDD from those files > automatically rather than writing the row-to-type conversion myself. > {code} > val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2), > Food("cherry", 3))) > val df0 = rdd0.toDF() > df0.save("foods.parquet") > val df1 = sqlContext.load("foods.parquet") > val rdd1 = df1.toTypedRDD[Food]() > // rdd0 and rdd1 should have the same elements > {code} > I originally submitted a smaller PR for spark-csv > <https://github.com/databricks/spark-csv/pull/52>, but Reynold Xin suggested > that converting a DataFrame to a typed RDD wasn't something specific to > spark-csv. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org