My attempts to create a dataframe of Array[Doubles], I get an error about
RDD[Array[Double]] not having a toDF function:

import sqlContext.implicits._
val testvec = Array( Array(1.0, 2.0, 3.0, 4.0), Array(5.0, 6.0, 7.0, 8.0))
val testrdd = sc.parallelize(testvec)
testrdd.toDF

gives

<console>:29: error: value toDF is not a member of
org.apache.spark.rdd.RDD[Array[Double]]
              testrdd.toD

on the other hand, if I make the dataframe more complicated, e.g.
Tuple2[String, Array[Double]], the transformation goes through:

val testvec = Array( ("row 1", Array(1.0, 2.0, 3.0, 4.0)), ("row 2",
Array(5.0, 6.0, 7.0, 8.0)) )
val testrdd = sc.parallelize(testvec)
testrdd.toDF

gives
testrdd: org.apache.spark.rdd.RDD[(String, Array[Double])] =
ParallelCollectionRDD[1] at parallelize at <console>:29
res3: org.apache.spark.sql.DataFrame = [_1: string, _2: array<double>]

What's the cause of this, and how can I get around it to create a dataframe
of Array[Double]? My end goal is to store that dataframe in Parquet (yes, I
do want to store all the values in a single column, not individual columns)

I am using Spark 1.5.2




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-make-a-dataframe-of-Array-Doubles-tp25704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to