I have confirmed this is fixed in Spark 1.6.1 RC 1. Thanks. On Tue, Feb 23, 2016 at 1:32 PM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote:
> Yes, I will test once 1.6.1 RC1 is released. Thanks. > > On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> I think this will be fixed in 1.6.1. Can you test when we post the first >> RC? (hopefully later today) >> >> On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < >> daniel.siegm...@teamaol.com> wrote: >> >>> Experimenting with datasets in Spark 1.6.0 I ran into a serialization >>> error when using case classes containing a Seq member. There is no >>> problem when using Array instead. Nor is there a problem using RDD or >>> DataFrame (even if converting the DF to a DS later). >>> >>> Here's an example you can test in the Spark shell: >>> >>> import sqlContext.implicits._ >>> >>> case class SeqThing(id: String, stuff: Seq[Int]) >>> val seqThings = Seq(SeqThing("A", Seq())) >>> val seqData = sc.parallelize(seqThings) >>> >>> case class ArrayThing(id: String, stuff: Array[Int]) >>> val arrayThings = Seq(ArrayThing("A", Array())) >>> val arrayData = sc.parallelize(arrayThings) >>> >>> >>> // Array works fine >>> arrayData.collect() >>> arrayData.toDF.as[ArrayThing] >>> arrayData.toDS >>> >>> // Seq can't convert directly to DS >>> seqData.collect() >>> seqData.toDF.as[SeqThing] >>> seqData.toDS // Serialization exception >>> >>> Is this working as intended? Are there plans to support serializing >>> arbitrary Seq values in datasets, or must everything be converted to >>> Array? >>> >>> ~Daniel Siegmann >>> >> >> >