Re: Serializing collections in Datasets

Daniel Siegmann Thu, 03 Mar 2016 07:55:25 -0800

I have confirmed this is fixed in Spark 1.6.1 RC 1. Thanks.

On Tue, Feb 23, 2016 at 1:32 PM, Daniel Siegmann <
daniel.siegm...@teamaol.com> wrote:


> Yes, I will test once 1.6.1 RC1 is released. Thanks.
>
> On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> I think this will be fixed in 1.6.1.  Can you test when we post the first
>> RC? (hopefully later today)
>>
>> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
>> daniel.siegm...@teamaol.com> wrote:
>>
>>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
>>> error when using case classes containing a Seq member. There is no
>>> problem when using Array instead. Nor is there a problem using RDD or
>>> DataFrame (even if converting the DF to a DS later).
>>>
>>> Here's an example you can test in the Spark shell:
>>>
>>> import sqlContext.implicits._
>>>
>>> case class SeqThing(id: String, stuff: Seq[Int])
>>> val seqThings = Seq(SeqThing("A", Seq()))
>>> val seqData = sc.parallelize(seqThings)
>>>
>>> case class ArrayThing(id: String, stuff: Array[Int])
>>> val arrayThings = Seq(ArrayThing("A", Array()))
>>> val arrayData = sc.parallelize(arrayThings)
>>>
>>>
>>> // Array works fine
>>> arrayData.collect()
>>> arrayData.toDF.as[ArrayThing]
>>> arrayData.toDS
>>>
>>> // Seq can't convert directly to DS
>>> seqData.collect()
>>> seqData.toDF.as[SeqThing]
>>> seqData.toDS // Serialization exception
>>>
>>> Is this working as intended? Are there plans to support serializing
>>> arbitrary Seq values in datasets, or must everything be converted to
>>> Array?
>>>
>>> ~Daniel Siegmann
>>>
>>
>>
>

Re: Serializing collections in Datasets

Reply via email to