I'd like to chime in as one of downstream integrators (Bigtop, etc.). As the original author of Maven packaging assemblies I might be able shed some light on the history behind of it:
- in order to integrate Spark well into existing Hadoop stack it was necessary to have a way to avoid transitive dependencies duplications and possible conflicts. E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs and later merely declare Spark package dependency on standard Bigtop Hadoop packages. And yes - Bigtop packaging means the naming and layout would be standard across all commercial Hadoop distributions that are worth mentioning: ASF Bigtop convenience binary packages, and Cloudera or Hortonworks packages. Hence, the downstream user doesn't need to spend any effort to make sure that Spark "clicks-in" properly. - Maven provides a relatively easy way to deal with the jar-hell problem, although the original maven build was just Shader'ing everything into a huge lump of class files. Oftentimes ending up with classes slamming on top of each other from different transitive dependencies. Artifact publishing isn't a deciding concern when it comes to Sbt vs Maven: it seems to be a no-brainer in both cases. I don't know Sbt that well to say that its assemblies do not or can not provide the same level of segregation as Maven's, but it seems this way. And that along is the huge blocker of dropping the support of Maven build. Now, what's the great deal of benefits supplemented by Sbt? Regards, Cos On Thu, Feb 20, 2014 at 08:03PM, Patrick Wendell wrote: > Hey All, > > It's very high overhead having two build systems in Spark. Before > getting into a long discussion about the merits of sbt vs maven, I > wanted to pose a simple question to the dev list: > > Is there anyone who feels that dropping either sbt or maven would have > a major consequence for them? > > And I say "major consequence" meaning something becomes completely > impossible now and can't be worked around. This is different from an > "inconvenience", i.e., something which can be worked around but will > require some investment. > > I'm posing the question in this way because, if there are features in > either build system that are absolutely-un-available in the other, > then we'll have to maintain both for the time being. I'm merely trying > to see whether this is the case... > > - Patrick