If you're looking at consolidating build systems, I'd ask to consider ease of cross-publishing for different Scala versions. My instinct is that sbt will be less troublesome in that regard (although as I understand it, the changes to the repl may present a problem).
We're needing to use 2.10 for a project, so I'd be happy to put in some work on the issue. On Monday, July 15, 2013 7:41:31 PM UTC-5, Matei Zaharia wrote: > > Hi all, > > I wanted to bring up a topic that there isn't a 100% perfect solution for, > but that's been bothering the team at Berkeley for a while: consolidating > Spark's build system. Right now we have two build systems, Maven and SBT, > that need to be maintained together on each change. We added Maven a while > back to try it as an alternative to SBT and to get some better publishing > options, like Debian packages and classifiers, but we've found that 1) SBT > has actually been fairly stable since then (unlike the rapid release cycle > before) and 2) classifiers don't actually seem to work for publishing > versions of Spark with different dependencies (you need to give them > different artifact names). More importantly though, because maintaining two > systems is confusing, it would be good to converge to just one soon, or to > find a better way of maintaining the builds. > > In terms of which system to go for, neither is perfect, but I think many > of us are leaning toward SBT, because it's noticeably faster and it has > less code to maintain. If we do this, however, I'd really like to > understand the use cases for Maven, and make sure that either we can > support them in SBT or we can do them externally. Can people say a bit > about that? The ones I've thought of are the following: > > - Debian packaging -- this is certainly nice, but there are some plugins > for SBT too so may be possible to migrate. > - BigTop integration; I'm not sure how much this relies on Maven but Cos > has been using it. > - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't > really work if you want to publish to Maven Central; you still need two > artifact names because the artifacts have different dependencies. However, > more importantly, we'd like to make Spark work with all Hadoop versions by > using hadoop-client and a bit of reflection, similar to how projects like > Parquet handle this. > > Are there other things I'm missing here, or other ways to handle this > problem that I'm missing? For example, one possibility would be to keep the > Maven build scripts in a separate repo managed by the people who want to > use them, or to have some dedicated maintainers for them. But because this > is often an issue, I do think it would be simpler for the project to have > one build system in the long term. In either case though, we will keep the > project structure compatible with Maven, so people who want to use it > internally should be fine; I think that we've done this well and, if > anything, we've simplified the Maven build process lately by removing > Twirl. > > Anyway, as I said, I don't think any solution is perfect here, but I'm > curious to hear your input. > > Matei
