Re: Serializing collections in Datasets
I have confirmed this is fixed in Spark 1.6.1 RC 1. Thanks. On Tue, Feb 23, 2016 at 1:32 PM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Yes, I will test once 1.6.1 RC1 is released. Thanks. > > On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust> wrote: > >> I think this will be fixed in 1.6.1. Can you test when we post the first >> RC? (hopefully later today) >> >> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < >> daniel.siegm...@teamaol.com> wrote: >> >>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization >>> error when using case classes containing a Seq member. There is no >>> problem when using Array instead. Nor is there a problem using RDD or >>> DataFrame (even if converting the DF to a DS later). >>> >>> Here's an example you can test in the Spark shell: >>> >>> import sqlContext.implicits._ >>> >>> case class SeqThing(id: String, stuff: Seq[Int]) >>> val seqThings = Seq(SeqThing("A", Seq())) >>> val seqData = sc.parallelize(seqThings) >>> >>> case class ArrayThing(id: String, stuff: Array[Int]) >>> val arrayThings = Seq(ArrayThing("A", Array())) >>> val arrayData = sc.parallelize(arrayThings) >>> >>> >>> // Array works fine >>> arrayData.collect() >>> arrayData.toDF.as[ArrayThing] >>> arrayData.toDS >>> >>> // Seq can't convert directly to DS >>> seqData.collect() >>> seqData.toDF.as[SeqThing] >>> seqData.toDS // Serialization exception >>> >>> Is this working as intended? Are there plans to support serializing >>> arbitrary Seq values in datasets, or must everything be converted to >>> Array? >>> >>> ~Daniel Siegmann >>> >> >> >
Re: Serializing collections in Datasets
Yes, I will test once 1.6.1 RC1 is released. Thanks. On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrustwrote: > I think this will be fixed in 1.6.1. Can you test when we post the first > RC? (hopefully later today) > > On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < > daniel.siegm...@teamaol.com> wrote: > >> Experimenting with datasets in Spark 1.6.0 I ran into a serialization >> error when using case classes containing a Seq member. There is no >> problem when using Array instead. Nor is there a problem using RDD or >> DataFrame (even if converting the DF to a DS later). >> >> Here's an example you can test in the Spark shell: >> >> import sqlContext.implicits._ >> >> case class SeqThing(id: String, stuff: Seq[Int]) >> val seqThings = Seq(SeqThing("A", Seq())) >> val seqData = sc.parallelize(seqThings) >> >> case class ArrayThing(id: String, stuff: Array[Int]) >> val arrayThings = Seq(ArrayThing("A", Array())) >> val arrayData = sc.parallelize(arrayThings) >> >> >> // Array works fine >> arrayData.collect() >> arrayData.toDF.as[ArrayThing] >> arrayData.toDS >> >> // Seq can't convert directly to DS >> seqData.collect() >> seqData.toDF.as[SeqThing] >> seqData.toDS // Serialization exception >> >> Is this working as intended? Are there plans to support serializing >> arbitrary Seq values in datasets, or must everything be converted to >> Array? >> >> ~Daniel Siegmann >> > >
Re: Serializing collections in Datasets
it works in 2.0.0-SNAPSHOT On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrustwrote: > I think this will be fixed in 1.6.1. Can you test when we post the first > RC? (hopefully later today) > > On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < > daniel.siegm...@teamaol.com> wrote: > >> Experimenting with datasets in Spark 1.6.0 I ran into a serialization >> error when using case classes containing a Seq member. There is no >> problem when using Array instead. Nor is there a problem using RDD or >> DataFrame (even if converting the DF to a DS later). >> >> Here's an example you can test in the Spark shell: >> >> import sqlContext.implicits._ >> >> case class SeqThing(id: String, stuff: Seq[Int]) >> val seqThings = Seq(SeqThing("A", Seq())) >> val seqData = sc.parallelize(seqThings) >> >> case class ArrayThing(id: String, stuff: Array[Int]) >> val arrayThings = Seq(ArrayThing("A", Array())) >> val arrayData = sc.parallelize(arrayThings) >> >> >> // Array works fine >> arrayData.collect() >> arrayData.toDF.as[ArrayThing] >> arrayData.toDS >> >> // Seq can't convert directly to DS >> seqData.collect() >> seqData.toDF.as[SeqThing] >> seqData.toDS // Serialization exception >> >> Is this working as intended? Are there plans to support serializing >> arbitrary Seq values in datasets, or must everything be converted to >> Array? >> >> ~Daniel Siegmann >> > >
Re: Serializing collections in Datasets
I think this will be fixed in 1.6.1. Can you test when we post the first RC? (hopefully later today) On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Experimenting with datasets in Spark 1.6.0 I ran into a serialization > error when using case classes containing a Seq member. There is no > problem when using Array instead. Nor is there a problem using RDD or > DataFrame (even if converting the DF to a DS later). > > Here's an example you can test in the Spark shell: > > import sqlContext.implicits._ > > case class SeqThing(id: String, stuff: Seq[Int]) > val seqThings = Seq(SeqThing("A", Seq())) > val seqData = sc.parallelize(seqThings) > > case class ArrayThing(id: String, stuff: Array[Int]) > val arrayThings = Seq(ArrayThing("A", Array())) > val arrayData = sc.parallelize(arrayThings) > > > // Array works fine > arrayData.collect() > arrayData.toDF.as[ArrayThing] > arrayData.toDS > > // Seq can't convert directly to DS > seqData.collect() > seqData.toDF.as[SeqThing] > seqData.toDS // Serialization exception > > Is this working as intended? Are there plans to support serializing > arbitrary Seq values in datasets, or must everything be converted to Array > ? > > ~Daniel Siegmann >
Serializing collections in Datasets
Experimenting with datasets in Spark 1.6.0 I ran into a serialization error when using case classes containing a Seq member. There is no problem when using Array instead. Nor is there a problem using RDD or DataFrame (even if converting the DF to a DS later). Here's an example you can test in the Spark shell: import sqlContext.implicits._ case class SeqThing(id: String, stuff: Seq[Int]) val seqThings = Seq(SeqThing("A", Seq())) val seqData = sc.parallelize(seqThings) case class ArrayThing(id: String, stuff: Array[Int]) val arrayThings = Seq(ArrayThing("A", Array())) val arrayData = sc.parallelize(arrayThings) // Array works fine arrayData.collect() arrayData.toDF.as[ArrayThing] arrayData.toDS // Seq can't convert directly to DS seqData.collect() seqData.toDF.as[SeqThing] seqData.toDS // Serialization exception Is this working as intended? Are there plans to support serializing arbitrary Seq values in datasets, or must everything be converted to Array? ~Daniel Siegmann