Henry, our hope is to avoid having to create two different Hadoop profiles altogether by using the hadoop-client package and reflection. This is what projects like Parquet (https://github.com/Parquet) are doing. If this works out, you get one artifact that can link to any Hadoop version that includes hadoop-client (which I believe means 1.2 onward).
Matei On Jul 16, 2013, at 1:26 PM, Henry Saputra <[email protected]> wrote: > Hi Matei, > > Thanks for bringing up this build system discussion. > > Some CI tools like hudson can support multi Maven profiles via different jobs > so we could deliver different release artifacts for different Maven profiles. > I believe it should be fine to have Spark-hadoop1 and Spark-haddop2 release > modules. > Just curious how actually SBT avoid/resolve this problem? To support for > different hadoop versions we need to change in the SparkBuild.scala to make > it work. > > > And as far as maintaining just one build system I am +1 for it. I prefer to > use Maven bc it has better dependency management than SBT. > > Thanks, > > Henry > > > On Mon, Jul 15, 2013 at 5:41 PM, Matei Zaharia <[email protected]> > wrote: > Hi all, > > I wanted to bring up a topic that there isn't a 100% perfect solution for, > but that's been bothering the team at Berkeley for a while: consolidating > Spark's build system. Right now we have two build systems, Maven and SBT, > that need to be maintained together on each change. We added Maven a while > back to try it as an alternative to SBT and to get some better publishing > options, like Debian packages and classifiers, but we've found that 1) SBT > has actually been fairly stable since then (unlike the rapid release cycle > before) and 2) classifiers don't actually seem to work for publishing > versions of Spark with different dependencies (you need to give them > different artifact names). More importantly though, because maintaining two > systems is confusing, it would be good to converge to just one soon, or to > find a better way of maintaining the builds. > > In terms of which system to go for, neither is perfect, but I think many of > us are leaning toward SBT, because it's noticeably faster and it has less > code to maintain. If we do this, however, I'd really like to understand the > use cases for Maven, and make sure that either we can support them in SBT or > we can do them externally. Can people say a bit about that? The ones I've > thought of are the following: > > - Debian packaging -- this is certainly nice, but there are some plugins for > SBT too so may be possible to migrate. > - BigTop integration; I'm not sure how much this relies on Maven but Cos has > been using it. > - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't > really work if you want to publish to Maven Central; you still need two > artifact names because the artifacts have different dependencies. However, > more importantly, we'd like to make Spark work with all Hadoop versions by > using hadoop-client and a bit of reflection, similar to how projects like > Parquet handle this. > > Are there other things I'm missing here, or other ways to handle this problem > that I'm missing? For example, one possibility would be to keep the Maven > build scripts in a separate repo managed by the people who want to use them, > or to have some dedicated maintainers for them. But because this is often an > issue, I do think it would be simpler for the project to have one build > system in the long term. In either case though, we will keep the project > structure compatible with Maven, so people who want to use it internally > should be fine; I think that we've done this well and, if anything, we've > simplified the Maven build process lately by removing Twirl. > > Anyway, as I said, I don't think any solution is perfect here, but I'm > curious to hear your input. > > Matei > > -- > You received this message because you are subscribed to the Google Groups > "Spark Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > You received this message because you are subscribed to the Google Groups > "Spark Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > >
