[ 
https://issues.apache.org/jira/browse/SPARK-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florian Verhein updated SPARK-5552:
-----------------------------------
    Summary: Automated data science AMI creation and data science cluster 
deployment on EC2  (was: Automated data science AMIs creation and cluster 
deployment on EC2)

> Automated data science AMI creation and data science cluster deployment on EC2
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-5552
>                 URL: https://issues.apache.org/jira/browse/SPARK-5552
>             Project: Spark
>          Issue Type: New Feature
>          Components: EC2
>            Reporter: Florian Verhein
>
> Issue created RE: 
> https://github.com/mesos/spark-ec2/pull/90#issuecomment-72597154 (please read 
> for background)
> Goal:
> Extend spark-ec2 scripts to create an automated data science cluster 
> deployment on EC2, suitable for almost(?)-production use.
> Use cases: 
> - A user can build their own custom data science AMIs from a CentOS minimal 
> image by calling a packer configuration (good defaults should be provided, 
> some options for flexibility)
> - A user can then easily deploy a new (correctly configured) cluster using 
> these AMIs, and do so as quickly as possible.
> Components/modules: Spark + tachyon + hdfs (on instance storage) + python + R 
> + vowpal wabbit + any rpms + ... + ganglia
> Focus is on reliability (rather than e.g. supporting many versions / dev 
> testing) and speed of deployment.
> Use hadoop 2 so option to lift into yarn later.
> My current solution is here: 
> https://github.com/florianverhein/spark-ec2/tree/packer. It includes other 
> fixes/improvements as needed to get it working.
> Now that it seems to work (but has deviated a lot more from the existing code 
> base than I was expecting), I'm wondering what to do with it...
> Keen to hear ideas if anyone is interested. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to