On Fri, Dec 12, 2014 at 2:24 PM, Ted Dunning <[email protected]> wrote: > > Hadoop dependencies are a quagmire. > > It would be far preferable to rewrite the necessary serialization to avoid > Hadoop dependencies entirely. > > If we dropping the MR code, why do we need to reference the VectorWritable > class at all? >
yes, this is the only form of serialization right now. Yes, it would be much more preferrable to rewrite it without going after Writable, in Kryo terms. Given amount of activity in that domain lately though, I am just being realistic here. But yes, i support getting rid of Writable type serialization. We do need Sequence file format though. Also keep in mind that spark brings hadoop dependencies as well. Which is also sort of both blessing and a curse. Blessing because we don't have to declare a particular hadoop dependency any longer. Curse is that because of course actuall hadoop version depends on parameters of Spark compilation; not what pom and maven tells us. So we are constrained only to pieces that are "forever" compatible accross hadoop history. >
