Re: choices for deploying a small hadoop cluster on EC2

Prashant Sharma Tue, 29 Nov 2011 12:34:17 -0800

yes pallets library. https://github.com/pallet/pallet-hadoop-example



On Wed, Nov 30, 2011 at 1:58 AM, Periya.Data <periya.d...@gmail.com> wrote:

> Hi All,
>        I am just beginning to learn how to deploy a small cluster (a 3
> node cluster) on EC2. After some quick Googling, I see the following
> approaches:
>
>   1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it
>   have features for persisting (EBS)?
>   2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC
>   etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2.
>   3. Install hadoop manually and related stuff like Hive...on each cluster
>   node...on EC2 (or use some automation tool like Chef). I do not prefer
> it.
>   4. Hadoop distribution comes with EC2 (under src/contrib) and there are
>   several Hadoop EC2 AMIs available. I have not studied enough to know if
>   that is easy for a beginner like me.
>   5. Anything else??
>
> 1 and 2 look promising as a beginner. If any of you have any thoughts about
> this, I would like to know (like what to keep in mind, what to take care
> of, caveats etc). I want my data /config to persist (using EBS) and
> continue from where I left off...(after a few days).  Also, I want to have
> HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation
> of them have to be done manually after I set up the cluster?
>
> Thanks very much,
>
> PD.
>

Re: choices for deploying a small hadoop cluster on EC2

Reply via email to