I've been setting up Spark cluster on EC2 using the provided ec2/spark_ec2.py script and am very happy I didn't have to write it from scratch. Thanks for providing it.
There have been some issues, though, and I have had to make some additions. So far, they are all additions of command-line options. For example, the original script allows access from anywhere to the various ports. I've added an option to specify what net/mask should be allowed to access those ports. I've filed a couple of pull requests, but they are not going anywhere. Given what I've seen of the traffic on this list, I don't feel that a lot of the developers are thinking about EC2 setup. I totally agree that it is not as important as improving the guts of Spark itself; nevertheless, I feel that being able to run Spark on EC2 smartly and easily is valuable. So, I have 2 questions for the committers: 1. Is ec2/spark_ec2.py something the committers a. are not thinking about? b. are planning to replace? c. other 2. Should I just start a new project based on ec2/spark_ec2.py but without all the other stuff and make (and share) my changes there? Regards, Art