Re: Discussion: Consolidating Spark's build system

Matei Zaharia Tue, 16 Jul 2013 13:37:09 -0700

Henry, our hope is to avoid having to create two different Hadoop profiles 
altogether by using the hadoop-client package and reflection. This is what 
projects like Parquet (https://github.com/Parquet) are doing. If this works 
out, you get one artifact that can link to any Hadoop version that includes 
hadoop-client (which I believe means 1.2 onward).


Matei

On Jul 16, 2013, at 1:26 PM, Henry Saputra <[email protected]> wrote:

> Hi Matei,
> 
> Thanks for bringing up this build system discussion.
> 
> Some CI tools like hudson can support multi Maven profiles via different jobs 
> so we could deliver different release artifacts for different Maven profiles. 
> I believe it should be fine to have Spark-hadoop1 and Spark-haddop2 release 
> modules.
> Just curious how actually SBT avoid/resolve this problem? To support for 
> different hadoop versions we need to change in the SparkBuild.scala to make 
> it work.
> 
> 
> And as far as maintaining just one build system I am +1 for it. I prefer to 
> use Maven bc it has better dependency management than SBT.
> 
> Thanks,
> 
> Henry
> 
> 
> On Mon, Jul 15, 2013 at 5:41 PM, Matei Zaharia <[email protected]> 
> wrote:
> Hi all,
> 
> I wanted to bring up a topic that there isn't a 100% perfect solution for, 
> but that's been bothering the team at Berkeley for a while: consolidating 
> Spark's build system. Right now we have two build systems, Maven and SBT, 
> that need to be maintained together on each change. We added Maven a while 
> back to try it as an alternative to SBT and to get some better publishing 
> options, like Debian packages and classifiers, but we've found that 1) SBT 
> has actually been fairly stable since then (unlike the rapid release cycle 
> before) and 2) classifiers don't actually seem to work for publishing 
> versions of Spark with different dependencies (you need to give them 
> different artifact names). More importantly though, because maintaining two 
> systems is confusing, it would be good to converge to just one soon, or to 
> find a better way of maintaining the builds.
> 
> In terms of which system to go for, neither is perfect, but I think many of 
> us are leaning toward SBT, because it's noticeably faster and it has less 
> code to maintain. If we do this, however, I'd really like to understand the 
> use cases for Maven, and make sure that either we can support them in SBT or 
> we can do them externally. Can people say a bit about that? The ones I've 
> thought of are the following:
> 
> - Debian packaging -- this is certainly nice, but there are some plugins for 
> SBT too so may be possible to migrate.
> - BigTop integration; I'm not sure how much this relies on Maven but Cos has 
> been using it.
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't 
> really work if you want to publish to Maven Central; you still need two 
> artifact names because the artifacts have different dependencies. However, 
> more importantly, we'd like to make Spark work with all Hadoop versions by 
> using hadoop-client and a bit of reflection, similar to how projects like 
> Parquet handle this.
> 
> Are there other things I'm missing here, or other ways to handle this problem 
> that I'm missing? For example, one possibility would be to keep the Maven 
> build scripts in a separate repo managed by the people who want to use them, 
> or to have some dedicated maintainers for them. But because this is often an 
> issue, I do think it would be simpler for the project to have one build 
> system in the long term. In either case though, we will keep the project 
> structure compatible with Maven, so people who want to use it internally 
> should be fine; I think that we've done this well and, if anything, we've 
> simplified the Maven build process lately by removing Twirl.
> 
> Anyway, as I said, I don't think any solution is perfect here, but I'm 
> curious to hear your input.
> 
> Matei
> 
> --
> You received this message because you are subscribed to the Google Groups 
> "Spark Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Spark Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>

Re: Discussion: Consolidating Spark's build system

Reply via email to