yes pallets library. https://github.com/pallet/pallet-hadoop-example
On Wed, Nov 30, 2011 at 1:58 AM, Periya.Data <periya.d...@gmail.com> wrote: > Hi All, > I am just beginning to learn how to deploy a small cluster (a 3 > node cluster) on EC2. After some quick Googling, I see the following > approaches: > > 1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it > have features for persisting (EBS)? > 2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC > etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2. > 3. Install hadoop manually and related stuff like Hive...on each cluster > node...on EC2 (or use some automation tool like Chef). I do not prefer > it. > 4. Hadoop distribution comes with EC2 (under src/contrib) and there are > several Hadoop EC2 AMIs available. I have not studied enough to know if > that is easy for a beginner like me. > 5. Anything else?? > > 1 and 2 look promising as a beginner. If any of you have any thoughts about > this, I would like to know (like what to keep in mind, what to take care > of, caveats etc). I want my data /config to persist (using EBS) and > continue from where I left off...(after a few days). Also, I want to have > HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation > of them have to be done manually after I set up the cluster? > > Thanks very much, > > PD. >