Say I want to build a complete Spark distribution against Hadoop 2.6+ as
fast as possible from scratch.

This is what I’m doing at the moment:

./make-distribution.sh -T 1C -Phadoop-2.6

-T 1C instructs Maven to spin up 1 thread per available core. This takes
around 20 minutes on an m3.large instance.

I see that spark-ec2, on the other hand, builds Spark as follows
<https://github.com/amplab/spark-ec2/blob/a990752575cd8b0ab25731d7820a55c714798ec3/spark/init.sh#L21-L22>
when you deploy Spark at a specific git commit:

sbt/sbt clean assembly
sbt/sbt publish-local

This seems slower than using make-distribution.sh, actually.

Is there a faster way to do this?

Nick
​

Reply via email to