Re: Discussion: Consolidating Spark's build system

Cody Koeninger Tue, 16 Jul 2013 13:37:09 -0700

If you're looking at consolidating build systems, I'd ask to consider ease 
of cross-publishing for different Scala versions.  My instinct is that sbt 
will be less troublesome in that regard (although as I understand it, the 
changes to the repl may present a problem).


We're needing to use 2.10 for a project, so I'd be happy to put in some 
work on the issue.


On Monday, July 15, 2013 7:41:31 PM UTC-5, Matei Zaharia wrote:
>
> Hi all, 
>
> I wanted to bring up a topic that there isn't a 100% perfect solution for, 
> but that's been bothering the team at Berkeley for a while: consolidating 
> Spark's build system. Right now we have two build systems, Maven and SBT, 
> that need to be maintained together on each change. We added Maven a while 
> back to try it as an alternative to SBT and to get some better publishing 
> options, like Debian packages and classifiers, but we've found that 1) SBT 
> has actually been fairly stable since then (unlike the rapid release cycle 
> before) and 2) classifiers don't actually seem to work for publishing 
> versions of Spark with different dependencies (you need to give them 
> different artifact names). More importantly though, because maintaining two 
> systems is confusing, it would be good to converge to just one soon, or to 
> find a better way of maintaining the builds. 
>
> In terms of which system to go for, neither is perfect, but I think many 
> of us are leaning toward SBT, because it's noticeably faster and it has 
> less code to maintain. If we do this, however, I'd really like to 
> understand the use cases for Maven, and make sure that either we can 
> support them in SBT or we can do them externally. Can people say a bit 
> about that? The ones I've thought of are the following: 
>
> - Debian packaging -- this is certainly nice, but there are some plugins 
> for SBT too so may be possible to migrate. 
> - BigTop integration; I'm not sure how much this relies on Maven but Cos 
> has been using it. 
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't 
> really work if you want to publish to Maven Central; you still need two 
> artifact names because the artifacts have different dependencies. However, 
> more importantly, we'd like to make Spark work with all Hadoop versions by 
> using hadoop-client and a bit of reflection, similar to how projects like 
> Parquet handle this. 
>
> Are there other things I'm missing here, or other ways to handle this 
> problem that I'm missing? For example, one possibility would be to keep the 
> Maven build scripts in a separate repo managed by the people who want to 
> use them, or to have some dedicated maintainers for them. But because this 
> is often an issue, I do think it would be simpler for the project to have 
> one build system in the long term. In either case though, we will keep the 
> project structure compatible with Maven, so people who want to use it 
> internally should be fine; I think that we've done this well and, if 
> anything, we've simplified the Maven build process lately by removing 
> Twirl. 
>
> Anyway, as I said, I don't think any solution is perfect here, but I'm 
> curious to hear your input. 
>
> Matei

Re: Discussion: Consolidating Spark's build system

Reply via email to