Hi in fact i have just found some written notes in my code.... see if this docs help you (it will work with any spark versions, not only 1.3.0)
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes hth On Sun, Sep 25, 2016 at 1:25 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi > > i must admit , i had issues as well in finding a sample that does that, > (hopefully Spark folks can add more examples or someone on the list can > post a sample code?) > > hopefully you can reuse sample below > So, you start from an rdd of doubles (myRdd) > > ## make a row > val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues) > > # then you can either call toDF directly. spk will build a schema for > you..beware you will need to import import org.apache.spark.sql. > SQLImplicits > > val df = toRddOfRows.toDF() > > # or you can create a schema yourself > def createSchema(row: Row) = { > val first = row.toSeq > val firstWithIdx = first.zipWithIndex > val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2, > DoubleType, false)) > StructType(fields) > > } > > val mySchema = createSchema(toRddOfRow.first()) > > // returning DataFrame > val mydf = sqlContext.createDataFrame(toRddOfRow, schema) > > > hth > > > > > > U need to define a schema to make a df out of your list... check spark > docs on how to make a df or some machine learning examples > > On 25 Sep 2016 12:57 pm, "Dan Bikle" <bikle...@gmail.com> wrote: > >> Hello World, >> >> I am familiar with Python and I am learning Spark-Scala. >> >> I want to build a DataFrame which has structure desribed by this syntax: >> >> >> >> >> >> >> >> >> *// Prepare training data from a list of (label, features) tuples.val >> training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)), >> (0.2, Vectors.dense(1.0, -1.0)), (3.0, Vectors.dense(1.3, 1.0)), (1.0, >> Vectors.dense(1.2, -0.5)))).toDF("label", "features")* >> I got the above syntax from this URL: >> >> http://spark.apache.org/docs/latest/ml-pipeline.html >> >> Currently my data is in array which I had pulled out of a DF: >> >> >> *val my_a = gspc17_df.collect().map{row => >> Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}* >> The structure of my array is very similar to the above DF: >> >> >> >> >> >> >> *my_a: Array[Seq[Any]] =Array( List(-1.4830674013266898, >> [-0.004192832940431825,-0.003170667657263393]), List(-0.05876766500768526, >> [-0.008462913654529357,-0.006880595828929472]), List(1.0109273250546658, >> [-3.1816797620416693E-4,-0.006502619326182358]))* >> How to copy data from my array into a DataFrame which has the above >> structure? >> >> I tried this syntax: >> >> >> *val my_df = spark.createDataFrame(my_a).toDF("label","features")* >> Spark barked at me: >> >> >> >> >> >> >> >> >> >> >> *<console>:105: error: inferred type arguments [Seq[Any]] do not conform >> to method createDataFrame's type parameter bounds [A <: Product] val >> my_df = >> spark.createDataFrame(my_a).toDF("label","features") >> ^<console>:105: error: type mismatch; found : >> scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A] val >> my_df = >> spark.createDataFrame(my_a).toDF("label","features") >> ^scala> * >> >