Re: Discussion: Consolidating Spark's build system

Shane Huang Tue, 16 Jul 2013 02:55:27 -0700

Hi Matai.

I myself would prefer Maven than SBT.


On one hand, from my own experience Maven is better at resolving lib
dependencies.  I used both SBT and Maven to build spark. When I use SBT I
often have to add customized resolvers in build script to make it pass,
while I don't have to do that for maven. (Maybe I missed something here...
I'm not an SBT expert anyway :( )

On the other hand, it seems the open source community is much more familiar
with maven than SBT. If we'd like more people to contribute to Spark, I
think using maven is a good idea.

Thanks,
Shane
[email protected]


On Tue, Jul 16, 2013 at 8:41 AM, Matei Zaharia <[email protected]>wrote:

> Hi all,
>
> I wanted to bring up a topic that there isn't a 100% perfect solution for,
> but that's been bothering the team at Berkeley for a while: consolidating
> Spark's build system. Right now we have two build systems, Maven and SBT,
> that need to be maintained together on each change. We added Maven a while
> back to try it as an alternative to SBT and to get some better publishing
> options, like Debian packages and classifiers, but we've found that 1) SBT
> has actually been fairly stable since then (unlike the rapid release cycle
> before) and 2) classifiers don't actually seem to work for publishing
> versions of Spark with different dependencies (you need to give them
> different artifact names). More importantly though, because maintaining two
> systems is confusing, it would be good to converge to just one soon, or to
> find a better way of maintaining the builds.
>
> In terms of which system to go for, neither is perfect, but I think many
> of us are leaning toward SBT, because it's noticeably faster and it has
> less code to maintain. If we do this, however, I'd really like to
> understand the use cases for Maven, and make sure that either we can
> support them in SBT or we can do them externally. Can people say a bit
> about that? The ones I've thought of are the following:
>
> - Debian packaging -- this is certainly nice, but there are some plugins
> for SBT too so may be possible to migrate.
> - BigTop integration; I'm not sure how much this relies on Maven but Cos
> has been using it.
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't
> really work if you want to publish to Maven Central; you still need two
> artifact names because the artifacts have different dependencies. However,
> more importantly, we'd like to make Spark work with all Hadoop versions by
> using hadoop-client and a bit of reflection, similar to how projects like
> Parquet handle this.
>
> Are there other things I'm missing here, or other ways to handle this
> problem that I'm missing? For example, one possibility would be to keep the
> Maven build scripts in a separate repo managed by the people who want to
> use them, or to have some dedicated maintainers for them. But because this
> is often an issue, I do think it would be simpler for the project to have
> one build system in the long term. In either case though, we will keep the
> project structure compatible with Maven, so people who want to use it
> internally should be fine; I think that we've done this well and, if
> anything, we've simplified the Maven build process lately by removing Twirl.
>
> Anyway, as I said, I don't think any solution is perfect here, but I'm
> curious to hear your input.
>
> Matei




-- 
*Shane Huang *
*Intel Asia-Pacific R&D Ltd.*
*Email: [email protected]*

Re: Discussion: Consolidating Spark's build system

Reply via email to