You guys know the mrlegacy much better than I do. I was just thinking that 
since it is pretty big it would be nice to prune some of it from the Scala deps.

Pruning would help with another issue--creating either an “all deps” artifact 
or a managed libs module. I had another weird bug (that’s 2 now) that seems 
relates to dependencies supplied on the classpath instead of in the jars. So it 
runs on one machine (dev machine) but not on another (cluster). This will drive 
users crazy if it happens very often. 

On Dec 12, 2014, at 2:31 PM, Dmitriy Lyubimov <[email protected]> wrote:

On Fri, Dec 12, 2014 at 2:24 PM, Ted Dunning <[email protected]> wrote:
> 
> Hadoop dependencies are a quagmire.
> 
> It would be far preferable to rewrite the necessary serialization to avoid
> Hadoop dependencies entirely.
> 
> If we dropping the MR code, why do we need to reference the VectorWritable
> class at all?
> 

yes, this is the only form of serialization right now. Yes, it would be
much more preferrable to rewrite it without going after Writable, in Kryo
terms.

Given amount of activity in that domain lately though, I am just being
realistic here.

But yes, i support getting rid of Writable type serialization.

We do need Sequence file format though.

Also keep in mind that spark brings hadoop dependencies as well. Which is
also sort of both blessing and a curse.

Blessing because we don't have to declare a particular hadoop dependency
any longer.

Curse is that because of course actuall hadoop version depends on
parameters of Spark compilation; not what pom and maven tells us. So we are
constrained only to pieces that are "forever" compatible accross hadoop
history.


> 

Reply via email to