Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

Marco Mistroni Sun, 25 Sep 2016 05:30:06 -0700

Hi
 in fact i have  just found  some written notes in my code.... see if this
docs help you (it will work with any spark versions, not only 1.3.0)


https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#creating-dataframes
hth


On Sun, Sep 25, 2016 at 1:25 PM, Marco Mistroni <mmistr...@gmail.com> wrote:

> Hi
>
>  i must admit , i had issues as well in finding a  sample that does that,
> (hopefully Spark folks can add more examples or someone on the list can
> post a sample code?)
>
> hopefully you can reuse sample below
> So,  you start from an rdd of doubles (myRdd)
>
> ## make a row
> val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues)
>
> # then you can either call toDF directly. spk will build a schema for
> you..beware you will need to import   import org.apache.spark.sql.
> SQLImplicits
>
> val df = toRddOfRows.toDF()
>
> # or you can create a schema  yourself
> def createSchema(row: Row) = {
>     val first = row.toSeq
>     val firstWithIdx = first.zipWithIndex
>     val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2,
> DoubleType, false))
>     StructType(fields)
>
>   }
>
> val mySchema =  createSchema(toRddOfRow.first())
>
> // returning DataFrame
> val mydf =   sqlContext.createDataFrame(toRddOfRow, schema)
>
>
> hth
>
>
>
>
>
> U need to define a schema to make a df out of your list... check spark
> docs on how to make a df or some machine learning examples
>
> On 25 Sep 2016 12:57 pm, "Dan Bikle" <bikle...@gmail.com> wrote:
>
>> Hello World,
>>
>> I am familiar with Python and I am learning Spark-Scala.
>>
>> I want to build a DataFrame which has structure desribed by this syntax:
>>
>>
>>
>>
>>
>>
>>
>>
>> *// Prepare training data from a list of (label, features) tuples.val
>> training = spark.createDataFrame(Seq(  (1.1, Vectors.dense(1.1, 0.1)),
>> (0.2, Vectors.dense(1.0, -1.0)),  (3.0, Vectors.dense(1.3, 1.0)),  (1.0,
>> Vectors.dense(1.2, -0.5)))).toDF("label", "features")*
>> I got the above syntax from this URL:
>>
>> http://spark.apache.org/docs/latest/ml-pipeline.html
>>
>> Currently my data is in array which I had pulled out of a DF:
>>
>>
>> *val my_a = gspc17_df.collect().map{row =>
>> Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}*
>> The structure of my array is very similar to the above DF:
>>
>>
>>
>>
>>
>>
>> *my_a: Array[Seq[Any]] =Array(  List(-1.4830674013266898,
>> [-0.004192832940431825,-0.003170667657263393]),  List(-0.05876766500768526,
>> [-0.008462913654529357,-0.006880595828929472]),  List(1.0109273250546658,
>> [-3.1816797620416693E-4,-0.006502619326182358]))*
>> How to copy data from my array into a DataFrame which has the above
>> structure?
>>
>> I tried this syntax:
>>
>>
>> *val my_df = spark.createDataFrame(my_a).toDF("label","features")*
>> Spark barked at me:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *<console>:105: error: inferred type arguments [Seq[Any]] do not conform
>> to method createDataFrame's type parameter bounds [A <: Product]       val
>> my_df =
>> spark.createDataFrame(my_a).toDF("label","features")
>> ^<console>:105: error: type mismatch; found   :
>> scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A]       val
>> my_df =
>> spark.createDataFrame(my_a).toDF("label","features")
>> ^scala> *
>>
>

Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

Reply via email to