Andrew, One thing I noticed with the ec2 scripts is that they don't update 'slaves' file for hadoop as well as 'regionservers' file for hbase. As a result, when I stop Hbase & Hadoop, the instances on the slaves don't stop. Not sure if others have experienced this.
Also, I believe this has been pointed out before, but it will be nice if hbase jar & zookeeper jar get copied automatically to hadoop lib because they are needed by MapReduce. Thanks. On Fri, Dec 11, 2009 at 10:10 AM, Andrew Purtell <[email protected]>wrote: > > Problem is - I don't know what HBase configurations to use > > in my MapReduce program to point to HBase on another EC2 machine. > > 1) Copy the hbase-site.xml from the HBase cluster master > /usr/local/hbase*/conf/hbase-site.xml > and put it on the classpath on your Hadoop cluster. Make sure you > have a hbase-default.xml on the classpath on the Hadoop cluster > also. > > 2) Make sure your Hadoop cluster instances can communicate with > the HBase zookeeper, master, and slave security groups. Typically > this means you have to execute a number of ec2-authorize commands > of the form: > > ec2-authorize <group-1> -o <group-2> -u <account-id> > ec2-authorize <group-2> -o <group-1> -u <account-id> > > where group-1 is foreach all of your Hadoop cluster's security > groups, and group-2 is foreach all of your HBase cluster's > security groups. It's annoying, but you only have to do it once > and the changes will persist in your security group ACLs. > > - Andy > > > > ________________________________ > From: Something Something <[email protected]> > To: [email protected] > Sent: Fri, December 11, 2009 9:50:30 AM > Subject: Re: Starting HBase in fully distributed mode... > > 1) Yes, I used the same cluster name. Okay, let me try again tonight, but > in any case, I was able to ssh to Master and confirm setup. > 2) I tried the Hadoop EC2 scripts last night. I keep getting 'Waiting for > instance to start' and seems like it gets stuck there. Also, keep getting > several message like this... > .Required option '-K, --private-key KEY' missing (-h for usage) > > Seems like I haven't set *something* correctly. Will look into this > tonight > as well. > > 3) Not sure what you mean here. Yes, my Hadoop machines will be on EC2 as > well. > > Here's my plan for the weekend: > > Start Hadoop instances on 10 EC2 machines. > Start HBase on 5 EC2 machines along with Zookeeper on 5 machines. > Start a MapReduce job on Hadoop (master) instance. > > Problem is - I don't know what HBase configurations to use in my MapReduce > program to point to HBase on another EC2 machine. Makes sense? > > > > On Fri, Dec 11, 2009 at 12:06 AM, Andrew Purtell <[email protected] > >wrote: > > > > ./bin/hbase-ec2 login testcluster > > > > > > Use this to login. I tried running this from my local machine, but > > nothing > > > *noteworthy* happened. > > > > Did you replace "testcluster" with the name you used when launching your > > cluster, assuming they are different? The scripts address clusters by the > > labels you give them when launching them. E.g. > > > > ./bin/hbase-ec2 launch foo 3 3 > > > > launches a cluster named "foo", and > > > > ./bin/hbase-ec2 login foo > > > > opens a SSH shell on the master of cluster "foo". > > > > > Did you also create similar scripts for Hadoop? > > > > Hadoop has its own set of EC2 scripts. I used those as the basis for > ours. > > You can't use the HBase and Hadoop EC2 scripts together however. > > > > > Later I want to start a MapReduce job on my Hadoop machines that will > > > access this HBase cluster. How would I do that? > > > > Are your Hadoop machines up on EC2 also? > > > > Running mapreduce jobs on the HBase cluster itself is a work in progress. > > > > - Andy > > > > > > > > ________________________________ > > From: Something Something <[email protected]> > > To: [email protected] > > Sent: Thu, December 10, 2009 8:21:10 PM > > Subject: Re: Starting HBase in fully distributed mode... > > > > Andy, > > > > Thanks for the tips. It's all working now. I was using a different > > KeyPair > > for EC2_ROOT_SSH_KEY. Once I changed this to use the root.pem it started > > working. I was able to ssh to the 'master' instance and get into hbase > > shell etc. This script is VERY helpful! Thank you so much. > > > > A few questions... > > > > 1) The README.txt file says this.. > > > > ./bin/hbase-ec2 login testcluster > > > > Use this to login. I tried running this from my local machine, but > nothing > > *noteworthy* happened. I wasn't able to get into the hbase shell from my > > local machine. Anyway, this is not a big deal for me. > > > > 2) Did you also create similar scripts for Hadoop? (I guess I will look > > into the trunk!). > > > > 3) Say I use your script to start HBase on a few machines, and start > > Hadoop > > on some other machines. Later I want to start a MapReduce job on my > Hadoop > > machines that will access this HBase cluster. How would I do that? What > > HBase configurations can I use? So far my Mapreduce job always accesses > > HBase on the same machine. > > > > Thanks once again for your help. > > > > > > > > On Thu, Dec 10, 2009 at 5:30 PM, Vaibhav Puranik <[email protected]> > > wrote: > > > > > We have HBase running on EC2 with starting Zookeeper within HBase. We > > have > > > it up since July 2009. No problems so far on Zookeeper front. > > > > > > Regards, > > > Vaibhav Puranik > > > Gumgum > > > > > > On Thu, Dec 10, 2009 at 8:12 AM, Something Something < > > > [email protected]> wrote: > > > > > > > Finally, I was able to get HBase running on EC2 in fully distributed > > > mode. > > > > I started ZooKeeper quorum myself and pointed HBase to it. I was > able > > > to > > > > create tables using HBase shell, ran a Mapreduce job that writes to > > these > > > > tables, and run queries against these tables. I used HBase shell > from > > > all > > > > 3 > > > > machines, and they all see the same data confirming that the > instances > > > are > > > > indeed working together. > > > > > > > > It seems like under EC2, starting ZooKeeper within HBase doesn't > work, > > > but > > > > I > > > > could be wrong. > > > > > > > > In any case, Andrew, I would like to get your scripts working in my > > > > environment because without your scripts I don't know how I would > grow > > my > > > > cluster from 3 instances to say, 30 :) > > > > > > > > Thank you so much everyone for your help and for sticking with me. > > > > > > > > > > > > On Wed, Dec 9, 2009 at 8:25 PM, Something Something < > > > > [email protected]> wrote: > > > > > > > > > When I run: > > > > > > > > > > hbase-ec2 launch-cluster testcluster 3 3 > > > > > > > > > > I keep getting 'lost connection' messages (See below). Tried this > 4 > > > > > times. Please help. Thanks. > > > > > > > > > > > > > > > ------------------------------------------------------------- > > > > > > > > > > Creating/checking security groups > > > > > Security group testcluster-master exists, ok > > > > > Security group testcluster exists, ok > > > > > Security group testcluster-zookeeper exists, ok > > > > > Starting ZooKeeper quorum ensemble. > > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group > > > > > testcluster-zookeeper > > > > > Waiting for instance i-9db6f4f5 to start: .................. > Started > > > > > ZooKeeper instance i-9db6f4f5 as > > > > domU-12-31-38-01-7D-D1.compute-1.internal > > > > > Public DNS name is ec2-174-129-148-5.compute-1.amazonaws.com. > > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group > > > > > testcluster-zookeeper > > > > > Waiting for instance i-2db7f545 to start: ................. Started > > > > > ZooKeeper instance i-2db7f545 as > > > > domU-12-31-38-01-7D-43.compute-1.internal > > > > > Public DNS name is ec2-174-129-157-122.compute-1.amazonaws.com > . > > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group > > > > > testcluster-zookeeper > > > > > Waiting for instance i-afb7f5c7 to start: ...................... > > > Started > > > > > ZooKeeper instance i-afb7f5c7 as > > > > domU-12-31-38-01-78-F3.compute-1.internal > > > > > Public DNS name is ec2-174-129-179-14.compute-1.amazonaws.com. > > > > > ZooKeeper quorum is > > > > > > > > > > > > > > > domU-12-31-38-01-7D-D1.compute-1.internal,domU-12-31-38-01-7D-43.compute-1.internal,domU-12-31-38-01-78-F3.compute-1.internal. > > > > > Initializing the ZooKeeper quorum ensemble. > > > > > ec2-174-129-148-5.compute-1.amazonaws.com > > > > > lost connection > > > > > ec2-174-129-157-122.compute-1.amazonaws.com > > > > > lost connection > > > > > ec2-174-129-179-14.compute-1.amazonaws.com > > > > > lost connection > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Dec 9, 2009 at 12:46 AM, Seth Ladd <[email protected]> > > wrote: > > > > > > > > > >> > Sounds like others have used Andrew's script successfully. The > > only > > > > >> > difference seems to be that it starts a *dedicated* ZooKeeper > > > quorum. > > > > >> > Should have listened to Mark when he suggested that 4 days ago > :) > > > > >> > > > > > >> > Anyway, I will try Andrew's script tomorrow. > > > > >> > > > > >> I can vouch that the scripts in svn trunk work. Thanks to Andrew > > for > > > > >> his help! I was able to start a 3 node Zookeeper and 5 node HBase > > > > >> cluster on EC2 from just the scripts. > > > > >> > > > > >> Seth > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
