@Nick, on a fresh EC2 instance a significant chunk of the initial build time might be due to artifact resolution + downloading. Putting pre-populated Ivy and Maven caches onto your EC2 machine could shave a decent chunk of time off that first build.
On Tue, Dec 8, 2015 at 9:16 AM, Nicholas Chammas <nicholas.cham...@gmail.com > wrote: > Thanks for the tips, Jakob and Steve. > > It looks like my original approach is the best for me since I'm installing > Spark on newly launched EC2 instances and can't take advantage of > incremental compilation. > > Nick > > On Tue, Dec 8, 2015 at 7:01 AM Steve Loughran <ste...@hortonworks.com> > wrote: > >> On 7 Dec 2015, at 19:07, Jakob Odersky <joder...@gmail.com> wrote: >> >> make-distribution and the second code snippet both create a distribution >> from a clean state. They therefore require that every source file be >> compiled and that takes time (you can maybe tweak some settings or use a >> newer compiler to gain some speed). >> >> I'm inferring from your question that for your use-case deployment speed >> is a critical issue, furthermore you'd like to build Spark for lots of >> (every?) commit in a systematic way. In that case I would suggest you try >> using the second code snippet without the `clean` task and only resort to >> it if the build fails. >> >> On my local machine, an assembly without a clean drops from 6 minutes to >> 2. >> >> regards, >> --Jakob >> >> >> 1. you can use zinc -where possible- to speed up scala compilations >> 2. you might also consider setting up a local jenkins VM, hooked to >> whatever git repo & branch you are working off, and have it do the builds >> and tests for you. Not so great for interactive dev, >> >> finally, on the mac, the "say" command is pretty handy at letting you >> know when some work in a terminal is ready, so you can do the >> first-thing-in-the morning build-of-the-SNAPSHOTS >> >> mvn install -DskipTests -Pyarn,hadoop-2.6 -Dhadoop.version=2.7.1; say moo >> >> After that you can work on the modules you care about (via the -pl) >> option). That doesn't work if you are running on an EC2 instance though >> >> >> >> >> On 23 November 2015 at 20:18, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> Say I want to build a complete Spark distribution against Hadoop 2.6+ as >>> fast as possible from scratch. >>> >>> This is what I’m doing at the moment: >>> >>> ./make-distribution.sh -T 1C -Phadoop-2.6 >>> >>> -T 1C instructs Maven to spin up 1 thread per available core. This >>> takes around 20 minutes on an m3.large instance. >>> >>> I see that spark-ec2, on the other hand, builds Spark as follows >>> <https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/spark/init.sh#L21-L22> >>> when you deploy Spark at a specific git commit: >>> >>> sbt/sbt clean assembly >>> sbt/sbt publish-local >>> >>> This seems slower than using make-distribution.sh, actually. >>> >>> Is there a faster way to do this? >>> >>> Nick >>> >>> >> >> >>