Hi Everyone, Today I merged a few improvements to the Spark EC2 scripts to master. I wanted to take a moment to explain what they are give some more color on the purpose of these scripts and how we plan to maintain them going forward. First, the new changes:
- Clusters can be created in any region - We now support the beefier HVM instance types - A specific version or git-tag of Spark can be selected when launching a cluster - Clusters can now be launched with newer versions of HDFS - Mesos has been fully replaced with the Standalone scheduler - There was substantial internal refactoring and clean-up The purpose of these scripts is to make it extremely easy to create ephemeral Spark clusters on EC2. In the past this has served two audiences: (i) new users who want to experiment with Spark on a real cluster and (ii) developers and researchers testing extensions to Spark. Because these are the main goals, we’ve focused on ease-of-provisioning and ensuring the cluster environment is as simple and predictable as possible. This is in part why we’ve moved from Mesos to the Standalone scheduler (as many know Spark can run on Mesos, YARN, and it’s own simplified scheduler). We also tightly control the OS, JVM version, installed packages, etc, so we can support people easily on the mailing list who use this to kick the tires with Spark. If you are running older versions of the ec2 scripts, including those in Spark 0.6/0.7, things will work just as they used to. This only affects 0.8.0 and newer. I also wanted to note that we may extend these scripts over time in ways that break *internal* compatibility with earlier versions. If you are building applications on top of our ec2 scripts, you should fork the `spark-ec2` repository and maintain your own copy of the repo (not sure if anyone’s doing this though…). Please feel free to test this out in the next few days and report any issues to me/the dev list. Hopefully this change will make it easier for people to get started with Spark even if they have AWS data in other regions. Thanks, - Patrick
