Hi All, I am just beginning to learn how to deploy a small cluster (a 3 node cluster) on EC2. After some quick Googling, I see the following approaches:
1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it have features for persisting (EBS)? 2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2. 3. Install hadoop manually and related stuff like Hive...on each cluster node...on EC2 (or use some automation tool like Chef). I do not prefer it. 4. Hadoop distribution comes with EC2 (under src/contrib) and there are several Hadoop EC2 AMIs available. I have not studied enough to know if that is easy for a beginner like me. 5. Anything else?? 1 and 2 look promising as a beginner. If any of you have any thoughts about this, I would like to know (like what to keep in mind, what to take care of, caveats etc). I want my data /config to persist (using EBS) and continue from where I left off...(after a few days). Also, I want to have HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation of them have to be done manually after I set up the cluster? Thanks very much, PD.