i believe kryo serialization uses runtime class, not declared class we have no issues serializing covariant scala lists
On Sat, Mar 22, 2014 at 11:59 AM, Pascal Voitot Dev < pascal.voitot....@gmail.com> wrote: > On Sat, Mar 22, 2014 at 3:45 PM, Michael Armbrust <mich...@databricks.com > >wrote: > > > > > > > From my experience, covariance often becomes a pain when dealing with > > > serialization/deserialization (I've experienced a few cases while > > > developing play-json & datomisca). > > > Moreover, if you have implicits, variance often becomes a headache... > > > > > > This is exactly the kind of feedback I was hoping for! Can you be any > more > > specific about the kinds of problems you ran into here? > > > > I've been rethinking about this topic after writing my first mail. > > The problem I was talking about is when you try to use typeclass converters > and make them contravariant/covariant for input/output. Something like: > > Reader[-I, +O] { def read(i:I): O } > > Doing this, you soon have implicit collisions and philosophical concerns > about what it means to serialize/deserialize a Parent class and a Child > class... > > For ex, if you have a Reader[I, Dog], you also have a Reader[I, Mammal] by > covariance. > Then you use this Reader[I, Mammal] to read a Cat because it's a Mammal. > But it fails as the original Reader expects the representation of a full > Dog, not only a part of it corresponding to the Mammal... > > So you see here that the problem is on deserialization/deserialization > mechanism itself. > > In your case, you don't have this kind of concerns as JavaSerializer and > KryoSerializer are more about basic marshalling that operates at low-level > class representation and you don't rely on implicit typeclasses... > > So let's consider what you really want, RDD[+T] and see whether it will > have bad impacts. > > if you do: > > val rddChild: RDD[Child] = sc.parallelize(Seq(Child(...), Child(...), ...)) > > If you perform map/reduce ops on this rddChild, when remoting objects, > spark context will serialize all sequence elements as Child. > > But if you do that: > val rddParent: RDD[Parent] = rddChild > > If you perform ops on rddParent, I believe that the serializer should > serialize elements as Parent elements, certainly losing some data from > Child. > On the remote node, they will be deserialized as Parent too but they > shouldn't be Child elements anymore. > > So, here, if it works as I say (I'm not sure), it would mean the following: > you have created a RDD from some data and just by invoking covariance, you > might have lost data through the remoting mechanism. > > Is it something bad in your opinion? (I'm thinking aloud) > > Pascal >