Re: [DISCUSS] Necessity of Maven and SBT Build in Spark

Paul Brown Fri, 21 Feb 2014 00:30:07 -0800

As a customer of the code, I don't care *how* the code gets built, but it
is important to me that the Maven artifacts (POM files, binaries, sources,
javadocs) are clean, accurate, up to date, and published on Maven Central.


Some examples where structure/publishing failures have been bad for users:

- For a long time (and perhaps still), Solr and Lucene were built by an Ant
build that produced incorrect POMs and required potential developers to
manually configure their IDEs.

- For a long time (and perhaps still), Pig was built by Ant, published
incorrect POMs, and failed to publish useful auxiliary artifacts like
PigUnit and the PiggyBank as Maven-addressable artifacts.  (That said,
thanks to Spark, we no longer use Pig...)

- For a long time (and perhaps still), Cassandra depended on
non-generally-available libraries (high-scale, etc.) that made it
inconvenient to embed Cassandra in a larger system.  Cassandra gets a
little slack because the build/structure was almost too terrible to look at
prior to incubation and it's gotten better...

And those are just a few projects at Apache that come to mind; I could make
a longish list of offenders.

btw, among other things that the Spark project probably *should* do would
be to publish artifacts with a classifier to distinguish the Hadoop version
linked against.

I'll be a happy user of sbt-built artifacts, or if the project goes/sticks
with Maven I'm more than willing to help answer questions or provide PRs
for stickier items around assemblies, multiple artifacts, etc.


—
[email protected] | Multifarious, Inc. | http://mult.ifario.us/


On Thu, Feb 20, 2014 at 11:56 PM, Sean Owen <[email protected]> wrote:

> Two builds is indeed a pain, since it's an ongoing chore to keep them
> in sync. For example, I am already seeing that the two do not quite
> declare the same dependencies (see recent patch).
>
> I think publishing artifacts to Maven central should be considered a
> hard requirement if it isn't already one from the ASF, and it may be?
> Certainly most people out there would be shocked if you told them
> Spark is not in the repo at all. And that requires at least
> maintaining a pom that declares the structure of the project.
>
> This does not necessarily mean using Maven to build, but is a reason
> that removing the pom is going to make this a lot harder for people to
> consume as a project.
>
> Maven has its pros and cons but there are plenty of people lurking
> around who know it quite well. Certainly it's easier for the Hadoop
> people to understand and work with. On the other hand, it supports
> Scala although only via a plugin, which is weaker support. sbt seems
> like a fairly new, basic, ad-hoc tool. Is there an advantage to it,
> other than being Scala (which is an advantage)?
>
> --
> Sean Owen | Director, Data Science | London
>
>
> On Fri, Feb 21, 2014 at 4:03 AM, Patrick Wendell <[email protected]>
> wrote:
> > Hey All,
> >
> > It's very high overhead having two build systems in Spark. Before
> > getting into a long discussion about the merits of sbt vs maven, I
> > wanted to pose a simple question to the dev list:
> >
> > Is there anyone who feels that dropping either sbt or maven would have
> > a major consequence for them?
> >
> > And I say "major consequence" meaning something becomes completely
> > impossible now and can't be worked around. This is different from an
> > "inconvenience", i.e., something which can be worked around but will
> > require some investment.
> >
> > I'm posing the question in this way because, if there are features in
> > either build system that are absolutely-un-available in the other,
> > then we'll have to maintain both for the time being. I'm merely trying
> > to see whether this is the case...
> >
> > - Patrick
>

Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark

Reply via email to

Re: [DISCUSS] Necessity of Maven and SBT Build in Spark