Re: Starting HBase in fully distributed mode...

Something Something Mon, 14 Dec 2009 09:38:47 -0800

Andrew,

One thing I noticed with the ec2 scripts is that they don't update 'slaves'
file for hadoop as well as 'regionservers' file for hbase.  As a result,
when I stop Hbase & Hadoop, the instances on the slaves don't stop.  Not
sure if others have experienced this.


Also, I believe this has been pointed out before, but it will be nice if
hbase jar & zookeeper jar get copied automatically to hadoop lib because
they are needed by MapReduce.

Thanks.


On Fri, Dec 11, 2009 at 10:10 AM, Andrew Purtell <[email protected]>wrote:

> > Problem is - I don't know what HBase configurations to use
> > in my MapReduce program to point to HBase on another EC2 machine.
>
> 1) Copy the hbase-site.xml from the HBase cluster master
>   /usr/local/hbase*/conf/hbase-site.xml
> and put it on the classpath on your Hadoop cluster. Make sure you
> have a hbase-default.xml on the classpath on the Hadoop cluster
> also.
>
> 2) Make sure your Hadoop cluster instances can communicate with
> the HBase zookeeper, master, and slave security groups. Typically
> this means you have to execute a number of ec2-authorize commands
> of the form:
>
>   ec2-authorize <group-1> -o <group-2> -u <account-id>
>   ec2-authorize <group-2> -o <group-1> -u <account-id>
>
> where group-1 is foreach all of your Hadoop cluster's security
> groups, and group-2 is foreach all of your HBase cluster's
> security groups. It's annoying, but you only have to do it once
> and the changes will persist in your security group ACLs.
>
>   - Andy
>
>
>
> ________________________________
> From: Something Something <[email protected]>
> To: [email protected]
> Sent: Fri, December 11, 2009 9:50:30 AM
> Subject: Re: Starting HBase in fully distributed mode...
>
> 1)  Yes, I used the same cluster name.  Okay, let me try again tonight, but
> in any case, I was able to ssh to Master and confirm setup.
> 2)  I tried the Hadoop EC2 scripts last night.  I keep getting 'Waiting for
> instance to start' and seems like it gets stuck there.  Also, keep getting
> several message like this...
> .Required option '-K, --private-key KEY' missing (-h for usage)
>
> Seems like I haven't set *something* correctly.  Will look into this
> tonight
> as well.
>
> 3)  Not sure what you mean here.  Yes, my Hadoop machines will be on EC2 as
> well.
>
> Here's my plan for the weekend:
>
> Start Hadoop instances on 10 EC2 machines.
> Start HBase on 5 EC2 machines along with Zookeeper on 5 machines.
> Start a  MapReduce job on Hadoop (master) instance.
>
> Problem is - I don't know what HBase configurations to use in my MapReduce
> program to point to HBase on another EC2 machine.  Makes sense?
>
>
>
> On Fri, Dec 11, 2009 at 12:06 AM, Andrew Purtell <[email protected]
> >wrote:
>
> > > ./bin/hbase-ec2 login testcluster
> > >
> > > Use this to login.  I tried running this from my local machine, but
> > nothing
> > > *noteworthy* happened.
> >
> > Did you replace "testcluster" with the name you used when launching your
> > cluster, assuming they are different? The scripts address clusters by the
> > labels you give them when launching them. E.g.
> >
> >   ./bin/hbase-ec2 launch foo 3 3
> >
> > launches a cluster named "foo", and
> >
> >   ./bin/hbase-ec2 login foo
> >
> > opens a SSH shell on the master of cluster "foo".
> >
> > > Did you also create similar scripts for Hadoop?
> >
> > Hadoop has its own set of EC2 scripts. I used those as the basis for
> ours.
> > You can't use the HBase and Hadoop EC2 scripts together however.
> >
> > > Later I want to start a MapReduce job on my Hadoop machines that will
> > > access this HBase cluster.  How would I do that?
> >
> > Are your Hadoop machines up on EC2 also?
> >
> > Running mapreduce jobs on the HBase cluster itself is a work in progress.
> >
> >   - Andy
> >
> >
> >
> > ________________________________
> > From: Something Something <[email protected]>
> > To: [email protected]
> > Sent: Thu, December 10, 2009 8:21:10 PM
> > Subject: Re: Starting HBase in fully distributed mode...
> >
> > Andy,
> >
> > Thanks for the tips.  It's all working now.  I was using a different
> > KeyPair
> > for EC2_ROOT_SSH_KEY.  Once I changed this to use the root.pem it started
> > working.  I was able to ssh to the 'master' instance and get into hbase
> > shell etc.  This script is VERY helpful!  Thank you so much.
> >
> > A few questions...
> >
> > 1)  The README.txt file says this..
> >
> > ./bin/hbase-ec2 login testcluster
> >
> > Use this to login.  I tried running this from my local machine, but
> nothing
> > *noteworthy* happened.  I wasn't able to get into the hbase shell from my
> > local machine.  Anyway, this is not a big deal for me.
> >
> > 2)  Did you also create similar scripts for Hadoop?  (I guess I will look
> > into the trunk!).
> >
> > 3)  Say I use your script to start HBase on a few machines, and start
> > Hadoop
> > on some other machines.  Later I want to start a MapReduce job on my
> Hadoop
> > machines that will access this HBase cluster.  How would I do that?  What
> > HBase configurations can I use?  So far my Mapreduce job always accesses
> > HBase on the same machine.
> >
> > Thanks once again for your help.
> >
> >
> >
> > On Thu, Dec 10, 2009 at 5:30 PM, Vaibhav Puranik <[email protected]>
> > wrote:
> >
> > > We have HBase running on EC2 with starting Zookeeper within HBase. We
> > have
> > > it up since July 2009. No problems so far on Zookeeper front.
> > >
> > > Regards,
> > > Vaibhav Puranik
> > > Gumgum
> > >
> > > On Thu, Dec 10, 2009 at 8:12 AM, Something Something <
> > > [email protected]> wrote:
> > >
> > > > Finally, I was able to get HBase running on EC2 in fully distributed
> > > mode.
> > > >  I started ZooKeeper quorum myself and pointed HBase to it.  I was
> able
> > > to
> > > > create tables using HBase shell, ran a Mapreduce job that writes to
> > these
> > > > tables, and run queries against these tables.  I used HBase shell
> from
> > > all
> > > > 3
> > > > machines, and they all see the same data confirming that the
> instances
> > > are
> > > > indeed working together.
> > > >
> > > > It seems like under EC2, starting ZooKeeper within HBase doesn't
> work,
> > > but
> > > > I
> > > > could be wrong.
> > > >
> > > > In any case, Andrew, I would like to get your scripts working in my
> > > > environment because without your scripts I don't know how I would
> grow
> > my
> > > > cluster from 3 instances to say, 30 :)
> > > >
> > > > Thank you so much everyone for your help and for sticking with me.
> > > >
> > > >
> > > > On Wed, Dec 9, 2009 at 8:25 PM, Something Something <
> > > > [email protected]> wrote:
> > > >
> > > > > When I run:
> > > > >
> > > > > hbase-ec2 launch-cluster testcluster 3 3
> > > > >
> > > > > I keep getting 'lost connection' messages (See below).  Tried this
> 4
> > > > > times.  Please help.  Thanks.
> > > > >
> > > > >
> > > > > -------------------------------------------------------------
> > > > >
> > > > > Creating/checking security groups
> > > > > Security group testcluster-master exists, ok
> > > > > Security group testcluster exists, ok
> > > > > Security group testcluster-zookeeper exists, ok
> > > > > Starting ZooKeeper quorum ensemble.
> > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group
> > > > > testcluster-zookeeper
> > > > > Waiting for instance i-9db6f4f5 to start: ..................
> Started
> > > > > ZooKeeper instance i-9db6f4f5 as
> > > > domU-12-31-38-01-7D-D1.compute-1.internal
> > > > >     Public DNS name is ec2-174-129-148-5.compute-1.amazonaws.com.
> > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group
> > > > > testcluster-zookeeper
> > > > > Waiting for instance i-2db7f545 to start: ................. Started
> > > > > ZooKeeper instance i-2db7f545 as
> > > > domU-12-31-38-01-7D-43.compute-1.internal
> > > > >     Public DNS name is ec2-174-129-157-122.compute-1.amazonaws.com
> .
> > > > > Starting an AMI with ID ami-b0cb29d9 (arch i386) in group
> > > > > testcluster-zookeeper
> > > > > Waiting for instance i-afb7f5c7 to start: ......................
> > > Started
> > > > > ZooKeeper instance i-afb7f5c7 as
> > > > domU-12-31-38-01-78-F3.compute-1.internal
> > > > >     Public DNS name is ec2-174-129-179-14.compute-1.amazonaws.com.
> > > > > ZooKeeper quorum is
> > > > >
> > > >
> > >
> >
> domU-12-31-38-01-7D-D1.compute-1.internal,domU-12-31-38-01-7D-43.compute-1.internal,domU-12-31-38-01-78-F3.compute-1.internal.
> > > > > Initializing the ZooKeeper quorum ensemble.
> > > > >     ec2-174-129-148-5.compute-1.amazonaws.com
> > > > > lost connection
> > > > >     ec2-174-129-157-122.compute-1.amazonaws.com
> > > > > lost connection
> > > > >     ec2-174-129-179-14.compute-1.amazonaws.com
> > > > > lost connection
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Dec 9, 2009 at 12:46 AM, Seth Ladd <[email protected]>
> > wrote:
> > > > >
> > > > >> > Sounds like others have used Andrew's script successfully.  The
> > only
> > > > >> > difference seems to be that it starts a *dedicated* ZooKeeper
> > > quorum.
> > > > >> > Should have listened to Mark when he suggested that 4 days ago
> :)
> > > > >> >
> > > > >> > Anyway, I will try Andrew's script tomorrow.
> > > > >>
> > > > >> I can vouch that the scripts in svn trunk work.  Thanks to Andrew
> > for
> > > > >> his help!  I was able to start a 3 node Zookeeper and 5 node HBase
> > > > >> cluster on EC2 from just the scripts.
> > > > >>
> > > > >> Seth
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>

Re: Starting HBase in fully distributed mode...

Reply via email to