choices for deploying a small hadoop cluster on EC2

Periya.Data Tue, 29 Nov 2011 12:29:24 -0800

Hi All,
        I am just beginning to learn how to deploy a small cluster (a 3
node cluster) on EC2. After some quick Googling, I see the following
approaches:


   1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it
   have features for persisting (EBS)?
   2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC
   etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2.
   3. Install hadoop manually and related stuff like Hive...on each cluster
   node...on EC2 (or use some automation tool like Chef). I do not prefer it.
   4. Hadoop distribution comes with EC2 (under src/contrib) and there are
   several Hadoop EC2 AMIs available. I have not studied enough to know if
   that is easy for a beginner like me.
   5. Anything else??

1 and 2 look promising as a beginner. If any of you have any thoughts about
this, I would like to know (like what to keep in mind, what to take care
of, caveats etc). I want my data /config to persist (using EBS) and
continue from where I left off...(after a few days).  Also, I want to have
HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation
of them have to be done manually after I set up the cluster?

Thanks very much,

PD.

choices for deploying a small hadoop cluster on EC2

Reply via email to