[ https://issues.apache.org/jira/browse/SPARK-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Florian Verhein updated SPARK-5552: ----------------------------------- Summary: Automated data science AMI creation and data science cluster deployment on EC2 (was: Automated data science AMIs creation and cluster deployment on EC2) > Automated data science AMI creation and data science cluster deployment on EC2 > ------------------------------------------------------------------------------ > > Key: SPARK-5552 > URL: https://issues.apache.org/jira/browse/SPARK-5552 > Project: Spark > Issue Type: New Feature > Components: EC2 > Reporter: Florian Verhein > > Issue created RE: > https://github.com/mesos/spark-ec2/pull/90#issuecomment-72597154 (please read > for background) > Goal: > Extend spark-ec2 scripts to create an automated data science cluster > deployment on EC2, suitable for almost(?)-production use. > Use cases: > - A user can build their own custom data science AMIs from a CentOS minimal > image by calling a packer configuration (good defaults should be provided, > some options for flexibility) > - A user can then easily deploy a new (correctly configured) cluster using > these AMIs, and do so as quickly as possible. > Components/modules: Spark + tachyon + hdfs (on instance storage) + python + R > + vowpal wabbit + any rpms + ... + ganglia > Focus is on reliability (rather than e.g. supporting many versions / dev > testing) and speed of deployment. > Use hadoop 2 so option to lift into yarn later. > My current solution is here: > https://github.com/florianverhein/spark-ec2/tree/packer. It includes other > fixes/improvements as needed to get it working. > Now that it seems to work (but has deviated a lot more from the existing code > base than I was expecting), I'm wondering what to do with it... > Keen to hear ideas if anyone is interested. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org