You guys know the mrlegacy much better than I do. I was just thinking that since it is pretty big it would be nice to prune some of it from the Scala deps.
Pruning would help with another issue--creating either an “all deps” artifact or a managed libs module. I had another weird bug (that’s 2 now) that seems relates to dependencies supplied on the classpath instead of in the jars. So it runs on one machine (dev machine) but not on another (cluster). This will drive users crazy if it happens very often. On Dec 12, 2014, at 2:31 PM, Dmitriy Lyubimov <[email protected]> wrote: On Fri, Dec 12, 2014 at 2:24 PM, Ted Dunning <[email protected]> wrote: > > Hadoop dependencies are a quagmire. > > It would be far preferable to rewrite the necessary serialization to avoid > Hadoop dependencies entirely. > > If we dropping the MR code, why do we need to reference the VectorWritable > class at all? > yes, this is the only form of serialization right now. Yes, it would be much more preferrable to rewrite it without going after Writable, in Kryo terms. Given amount of activity in that domain lately though, I am just being realistic here. But yes, i support getting rid of Writable type serialization. We do need Sequence file format though. Also keep in mind that spark brings hadoop dependencies as well. Which is also sort of both blessing and a curse. Blessing because we don't have to declare a particular hadoop dependency any longer. Curse is that because of course actuall hadoop version depends on parameters of Spark compilation; not what pom and maven tells us. So we are constrained only to pieces that are "forever" compatible accross hadoop history. >
