Re: Serializing collections in Datasets

2016-03-03 Thread Daniel Siegmann
I have confirmed this is fixed in Spark 1.6.1 RC 1. Thanks.

On Tue, Feb 23, 2016 at 1:32 PM, Daniel Siegmann <
daniel.siegm...@teamaol.com> wrote:

> Yes, I will test once 1.6.1 RC1 is released. Thanks.
>
> On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust 
> wrote:
>
>> I think this will be fixed in 1.6.1.  Can you test when we post the first
>> RC? (hopefully later today)
>>
>> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
>> daniel.siegm...@teamaol.com> wrote:
>>
>>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
>>> error when using case classes containing a Seq member. There is no
>>> problem when using Array instead. Nor is there a problem using RDD or
>>> DataFrame (even if converting the DF to a DS later).
>>>
>>> Here's an example you can test in the Spark shell:
>>>
>>> import sqlContext.implicits._
>>>
>>> case class SeqThing(id: String, stuff: Seq[Int])
>>> val seqThings = Seq(SeqThing("A", Seq()))
>>> val seqData = sc.parallelize(seqThings)
>>>
>>> case class ArrayThing(id: String, stuff: Array[Int])
>>> val arrayThings = Seq(ArrayThing("A", Array()))
>>> val arrayData = sc.parallelize(arrayThings)
>>>
>>>
>>> // Array works fine
>>> arrayData.collect()
>>> arrayData.toDF.as[ArrayThing]
>>> arrayData.toDS
>>>
>>> // Seq can't convert directly to DS
>>> seqData.collect()
>>> seqData.toDF.as[SeqThing]
>>> seqData.toDS // Serialization exception
>>>
>>> Is this working as intended? Are there plans to support serializing
>>> arbitrary Seq values in datasets, or must everything be converted to
>>> Array?
>>>
>>> ~Daniel Siegmann
>>>
>>
>>
>


Re: Serializing collections in Datasets

2016-02-23 Thread Daniel Siegmann
Yes, I will test once 1.6.1 RC1 is released. Thanks.

On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust 
wrote:

> I think this will be fixed in 1.6.1.  Can you test when we post the first
> RC? (hopefully later today)
>
> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
> daniel.siegm...@teamaol.com> wrote:
>
>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
>> error when using case classes containing a Seq member. There is no
>> problem when using Array instead. Nor is there a problem using RDD or
>> DataFrame (even if converting the DF to a DS later).
>>
>> Here's an example you can test in the Spark shell:
>>
>> import sqlContext.implicits._
>>
>> case class SeqThing(id: String, stuff: Seq[Int])
>> val seqThings = Seq(SeqThing("A", Seq()))
>> val seqData = sc.parallelize(seqThings)
>>
>> case class ArrayThing(id: String, stuff: Array[Int])
>> val arrayThings = Seq(ArrayThing("A", Array()))
>> val arrayData = sc.parallelize(arrayThings)
>>
>>
>> // Array works fine
>> arrayData.collect()
>> arrayData.toDF.as[ArrayThing]
>> arrayData.toDS
>>
>> // Seq can't convert directly to DS
>> seqData.collect()
>> seqData.toDF.as[SeqThing]
>> seqData.toDS // Serialization exception
>>
>> Is this working as intended? Are there plans to support serializing
>> arbitrary Seq values in datasets, or must everything be converted to
>> Array?
>>
>> ~Daniel Siegmann
>>
>
>


Re: Serializing collections in Datasets

2016-02-22 Thread Koert Kuipers
it works in 2.0.0-SNAPSHOT

On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust 
wrote:

> I think this will be fixed in 1.6.1.  Can you test when we post the first
> RC? (hopefully later today)
>
> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
> daniel.siegm...@teamaol.com> wrote:
>
>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
>> error when using case classes containing a Seq member. There is no
>> problem when using Array instead. Nor is there a problem using RDD or
>> DataFrame (even if converting the DF to a DS later).
>>
>> Here's an example you can test in the Spark shell:
>>
>> import sqlContext.implicits._
>>
>> case class SeqThing(id: String, stuff: Seq[Int])
>> val seqThings = Seq(SeqThing("A", Seq()))
>> val seqData = sc.parallelize(seqThings)
>>
>> case class ArrayThing(id: String, stuff: Array[Int])
>> val arrayThings = Seq(ArrayThing("A", Array()))
>> val arrayData = sc.parallelize(arrayThings)
>>
>>
>> // Array works fine
>> arrayData.collect()
>> arrayData.toDF.as[ArrayThing]
>> arrayData.toDS
>>
>> // Seq can't convert directly to DS
>> seqData.collect()
>> seqData.toDF.as[SeqThing]
>> seqData.toDS // Serialization exception
>>
>> Is this working as intended? Are there plans to support serializing
>> arbitrary Seq values in datasets, or must everything be converted to
>> Array?
>>
>> ~Daniel Siegmann
>>
>
>


Re: Serializing collections in Datasets

2016-02-22 Thread Michael Armbrust
I think this will be fixed in 1.6.1.  Can you test when we post the first
RC? (hopefully later today)

On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann <
daniel.siegm...@teamaol.com> wrote:

> Experimenting with datasets in Spark 1.6.0 I ran into a serialization
> error when using case classes containing a Seq member. There is no
> problem when using Array instead. Nor is there a problem using RDD or
> DataFrame (even if converting the DF to a DS later).
>
> Here's an example you can test in the Spark shell:
>
> import sqlContext.implicits._
>
> case class SeqThing(id: String, stuff: Seq[Int])
> val seqThings = Seq(SeqThing("A", Seq()))
> val seqData = sc.parallelize(seqThings)
>
> case class ArrayThing(id: String, stuff: Array[Int])
> val arrayThings = Seq(ArrayThing("A", Array()))
> val arrayData = sc.parallelize(arrayThings)
>
>
> // Array works fine
> arrayData.collect()
> arrayData.toDF.as[ArrayThing]
> arrayData.toDS
>
> // Seq can't convert directly to DS
> seqData.collect()
> seqData.toDF.as[SeqThing]
> seqData.toDS // Serialization exception
>
> Is this working as intended? Are there plans to support serializing
> arbitrary Seq values in datasets, or must everything be converted to Array
> ?
>
> ~Daniel Siegmann
>


Serializing collections in Datasets

2016-02-22 Thread Daniel Siegmann
Experimenting with datasets in Spark 1.6.0 I ran into a serialization error
when using case classes containing a Seq member. There is no problem when
using Array instead. Nor is there a problem using RDD or DataFrame (even if
converting the DF to a DS later).

Here's an example you can test in the Spark shell:

import sqlContext.implicits._

case class SeqThing(id: String, stuff: Seq[Int])
val seqThings = Seq(SeqThing("A", Seq()))
val seqData = sc.parallelize(seqThings)

case class ArrayThing(id: String, stuff: Array[Int])
val arrayThings = Seq(ArrayThing("A", Array()))
val arrayData = sc.parallelize(arrayThings)


// Array works fine
arrayData.collect()
arrayData.toDF.as[ArrayThing]
arrayData.toDS

// Seq can't convert directly to DS
seqData.collect()
seqData.toDF.as[SeqThing]
seqData.toDS // Serialization exception

Is this working as intended? Are there plans to support serializing
arbitrary Seq values in datasets, or must everything be converted to Array?

~Daniel Siegmann