>From the complete newbie perspective, there is a tendency to rush through the "Getting Started" guide to get HBase up and running quickly to start playing with it.
Assuming the issues I ran into are relatively common for distributed installs, I would say the main lessons I took away that weren't explicit in the "Getting Started" guide are the following: 1) Networking can be problematic. Host settings are important. 2) The distinction between ZooKeepers and RegionServers is important. 3) How to debug, or what to look for in the logs (zk_dump, status 'simple', prompts on start up, common hang-up messages in the logs) 4) More examples of correct configurations and what outputs of those debug routines should correctly say Obviously you can't tackle every edge case. Typically when I encounter an error I just Google it and find a random blog post indicating what was done to fix it. However the community seems to be young and growing still. On Wed, Nov 11, 2009 at 1:02 PM, stack <[email protected]> wrote: > If there is anything you thing we should improve in our 'Getting Started', > please say so. > St.Ack > > On Wed, Nov 11, 2009 at 9:57 AM, Chris Bates < > [email protected]> wrote: > > > Ok, thats what I was wondering. I started reading through the Hbase > chapter > > in OReilly's Hadoop book. > > > > All -- thank you for your help! I hope to do a short write up in a blog > I > > just started. Hopefully this discussion will be helpful to other new > > users. > > > > On Wed, Nov 11, 2009 at 12:41 PM, Lars George <[email protected]> > wrote: > > > > > Hi Chris, > > > > > > Please note that only one server gets to host the .META. table and > > another > > > the -ROOT- table. That is it. All user regions are then equally > > distributed > > > across all region servers. > > > > > > So it sounds like you are all set. Not sure what happened initially > with > > > the Zookeeper hang up. Keep an eye on it. > > > > > > Lars > > > > > > > > > On Nov 11, 2009, at 15:12, Chris Bates < > > [email protected]> > > > wrote: > > > > > > @Lars -- We're configured like in the "Getting Started" guide -- tried > > to > > >> follow the instructions verbatim. > > >> > > >> Ok. Thanks for the tip. I'm not sure what causes the orphan behavior, > > but > > >> thats good to know. I'll also know what to look for in the jstack. > > After > > >> stopping the region servers individually, then doing a stop / restart, > I > > >> get > > >> regionserver logs > > >> > > >> Here is M2 (chanel) > > >> Wed Nov 11 08:49:30 EST 2009 Starting regionserver on chanel > > >> ulimit -n 1024 > > >> 2009-11-11 08:49:32,081 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: > > >> vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError, > > >> -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, -Dhbase.lo > > >> 2009-11-11 08:49:32,229 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: My address is > > >> chanel.local:60020 > > >> 2009-11-11 08:49:32,418 INFO > > org.apache.hadoop.hbase.ipc.HBaseRpcMetrics: > > >> Initializing RPC Metrics with hostName=HRegionServer, port=60020 > > >> 2009-11-11 08:49:32,629 INFO > > >> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > > >> globalMemStoreLimit=398.7m, globalMemStoreLimitLowMark=249.2m, > > >> maxHeap=996.8m > > >> 2009-11-11 08:49:32,641 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Runs every > > 10000000ms > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:zookeeper.version=3.2.1-808558, built on 08/27/2009 18:48 > > GMT > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:host.name=chanel.local > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:java.version=1.6.0_16 > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:java.vendor=Sun Microsystems Inc. > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.16/jre > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> > > >> > > > environment:java.class.path=/opt/hadoop/hbase-0.20.1/bin/../conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/opt/hadoop/hbase-0.20.1/bin/..: > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> > > >> > > > environment:java.library.path=/opt/hadoop/hbase-0.20.1/bin/../lib/native/Linux-i386-32 > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:java.io.tmpdir=/tmp > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:java.compiler=<NA> > > >> 2009-11-11 08:49:32,749 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:os.name=Linux > > >> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:os.arch=i386 > > >> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:os.version=2.6.24-24-generic > > >> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:user.name=hadoop > > >> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:user.home=/home/hadoop > > >> 2009-11-11 08:49:32,750 INFO org.apache.zookeeper.ZooKeeper: Client > > >> environment:user.dir=/opt/hadoop/hbase-0.20.1 > > >> 2009-11-11 08:49:32,760 INFO org.apache.zookeeper.ZooKeeper: > Initiating > > >> client connection, > > >> > > >> > > > connectString=crunch2:2181,chanel2:2181,chanel:2181,chris:2181,crunch3:2181 > > >> sessionTimeout=60000 watcher=org.apa > > >> 2009-11-11 08:49:32,781 INFO org.apache.zookeeper.ClientCnxn: > > >> zookeeper.disableAutoWatchReset is false > > >> 2009-11-11 08:49:32,800 INFO org.apache.zookeeper.ClientCnxn: > Attempting > > >> connection to server crunch2/172.16.1.95:2181 > > >> 2009-11-11 08:49:32,802 INFO org.apache.zookeeper.ClientCnxn: Priming > > >> connection to java.nio.channels.SocketChannel[connected local=/ > > >> 172.16.1.45:33873 remote=crunch2/172.16.1.95:2181] > > >> 2009-11-11 08:49:32,814 INFO org.apache.zookeeper.ClientCnxn: Server > > >> connection successful > > >> 2009-11-11 08:49:32,829 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper > event, > > >> state: SyncConnected, type: None, path: null > > >> 2009-11-11 08:49:32,973 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at > > >> 172.16.1.46:60000 that we are up > > >> 2009-11-11 08:49:33,509 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us > > >> address > > >> to use. Was=172.16.1.45:60020, Now=172.16.1.45 > > >> 2009-11-11 08:49:34,090 INFO > org.apache.hadoop.hbase.regionserver.HLog: > > >> HLog > > >> configuration: blocksize=67108864, rollsize=63753420, enabled=true, > > >> flushlogentries=100, optionallogflushinternal=10000ms > > >> 2009-11-11 08:49:34,281 INFO > org.apache.hadoop.hbase.regionserver.HLog: > > >> New > > >> hlog > > /hbase/.logs/chanel.local,60020,1257947373328/hlog.dat.1257947374090 > > >> 2009-11-11 08:49:34,319 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > > >> Initializing JVM Metrics with processName=RegionServer, > > >> sessionId=regionserver/172.16.1.45:60020 > > >> 2009-11-11 08:49:34,350 INFO > > >> org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics: > > >> Initialized > > >> 2009-11-11 08:49:34,723 INFO org.apache.hadoop.http.HttpServer: Port > > >> returned by webServer.getConnectors()[0].getLocalPort() before open() > is > > >> -1. > > >> Opening the listener on 60030 > > >> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer: > > >> listener.getLocalPort() returned 60030 > > >> webServer.getConnectors()[0].getLocalPort() returned 60030 > > >> 2009-11-11 08:49:34,724 INFO org.apache.hadoop.http.HttpServer: Jetty > > >> bound > > >> to port 60030 > > >> 2009-11-11 08:49:35,673 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> Responder: starting > > >> 2009-11-11 08:49:35,674 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> listener on 60020: starting > > >> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 0 on 60020: starting > > >> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 1 on 60020: starting > > >> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 2 on 60020: starting > > >> 2009-11-11 08:49:35,675 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 3 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 4 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 5 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 6 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 7 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 8 on 60020: starting > > >> 2009-11-11 08:49:35,676 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer > > started > > >> at: 172.16.1.45:60020 > > >> 2009-11-11 08:49:35,677 INFO org.apache.hadoop.ipc.HBaseServer: IPC > > Server > > >> handler 9 on 60020: starting > > >> 2009-11-11 08:49:35,694 INFO > > >> org.apache.hadoop.hbase.regionserver.StoreFile: > > >> Allocating LruBlockCache with maximum size 199.4m > > >> 2009-11-11 08:49:38,798 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > > >> .META.,,1 > > >> 2009-11-11 08:49:38,811 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > > >> MSG_REGION_OPEN: > > >> .META.,,1 > > >> 2009-11-11 08:49:38,932 INFO > > org.apache.hadoop.hbase.regionserver.HRegion: > > >> region .META.,,1/1028785192 available; sequence id is 0 > > >> > > >> The only difference between that log file and the rest is that the > > others > > >> don't have > > >> 2009-11-11 08:49:38,798 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > > >> .META.,,1 > > >> 2009-11-11 08:49:38,811 INFO > > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > > >> MSG_REGION_OPEN: > > >> .META.,,1 > > >> 2009-11-11 08:49:38,932 INFO > > org.apache.hadoop.hbase.regionserver.HRegion: > > >> region .META.,,1/1028785192 available; sequence id is 0 > > >> > > >> except M5 which has it and says id is 3....if you want me to paste all > > the > > >> logs, I can...didn't think it was necessary. > > >> > > >> hbase(main):001:0> zk_dump > > >> > > >> HBase tree in ZooKeeper is rooted at /hbase > > >> Cluster up? true > > >> In safe mode? false > > >> Master address: 172.16.1.46:60000 > > >> Region server holding ROOT: 172.16.1.96:60020 > > >> Region servers: > > >> - 172.16.1.95:60020 > > >> - 172.16.1.96:60020 > > >> - 172.16.1.83:60020 > > >> - 172.16.1.45:60020 > > >> hbase(main):002:0> status 'simple' > > >> 4 live servers > > >> crunch3.local:60020 1257928281006 > > >> requests=0, regions=1, usedHeap=31, maxHeap=998 > > >> chris.local:60020 1257947367197 > > >> requests=0, regions=0, usedHeap=29, maxHeap=996 > > >> crunch2.local:60020 1257947370011 > > >> requests=0, regions=0, usedHeap=29, maxHeap=998 > > >> chanel.local:60020 1257947373328 > > >> requests=0, regions=1, usedHeap=31, maxHeap=996 > > >> 0 dead servers > > >> > > >> > > >> I started and stopped Hbase a couple times without deleting the HDFS > > copy, > > >> and everything seems to work fine....was that it??? What caused the > > >> regionservers to orphan? > > >> > > >> On Wed, Nov 11, 2009 at 5:17 AM, Lars George <[email protected]> > > wrote: > > >> > > >> Hi Chris, > > >>> > > >>> As Tatsuya says, but I would expect you have to "kill" them as they > are > > >>> in > > >>> a state where a daemon stop may not work as they will not be > > perceptible > > >>> to > > >>> that event yet. > > >>> > > >>> I saw the same as Tatsuya and assumed you have an issue with your > > quorum > > >>> setting on the slaves. I have not followed the whole discussion in > all > > >>> the > > >>> details so let me ask you how you configured it? > > >>> > > >>> I have personally done a symlink of my standalone zoo.cfg from the > > >>> zookeeper/conf to the hbase/conf directory. You could also have set > the > > >>> quorum servers in the hbase-site.xml as per the "Getting started" > > guide. > > >>> > > >>> If you have done one of those ways already then make sure all ZK > > servers > > >>> are up and can be reached from the region servers. > > >>> > > >>> Lars > > >>> > > >>> Tatsuya Kawano schrieb: > > >>> > > >>> Hi Chris, and thanks Lars for help. > > >>> > > >>>> > > >>>> OK. So "jstack 22200" shows your region server is trying to finish > > >>>> starting up, but stuck in a middle when try to get IP address of the > > >>>> master from ZooKeeper. > > >>>> > > >>>> =========================================================== > > >>>> "main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition > > >>>> [0xb72f2000] > > >>>> java.lang.Thread.State: TIMED_WAITING (sleeping) > > >>>> at java.lang.Thread.sleep(Native Method) > > >>>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74) > > >>>> at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51) > > >>>> at > > >>>> > > >>>> > > >>>> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387) > > >>>> at > > >>>> > > >>>> > > >>>> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315) > > >>>> at > > >>>> > > >>>> > > >>>> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306) > > >>>> at > > >>>> > > >>>> > > >>>> > > > org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276) > > >>>> =========================================================== > > >>>> > > >>>> I still need to see the regionserver logs to figure out why this is > > >>>> happening. > > >>>> > > >>>> > > >>>> Also, > > >>>> > > >>>> > > >>>> > > >>>> This is what I see when I run start-hbase.sh -- I can ssh into any > of > > >>>>> the > > >>>>> boxes with no password just fine, it just gives me a weird first > time > > >>>>> host > > >>>>> message...we get the same thing when we start up hadoop. > > >>>>> > > >>>>> > > >>>>> ... > > >>>> > > >>>> > > >>>> crunch2: regionserver running as process 6950. Stop it first. > > >>>>> chanel: regionserver running as process 22200. Stop it first. > > >>>>> crunch3: regionserver running as process 28962. Stop it first. > > >>>>> chris: regionserver running as process 28719. Stop it first. > > >>>>> > > >>>>> > > >>>>> > > >>>> This "Stop it first" message means your region servers didn't stop > > >>>> when you ran stop-hbase.sh. The master couldn't locate those region > > >>>> servers so it couldn't tell them to shutdown. This is why you've got > > >>>> those orphan region servers. So until we finish setting your HBase > > >>>> cluster up, you'll have to stop those region servers by hand. > > >>>> > > >>>> To do this, ssh to M2 -- M5, and type the following command: > > >>>> ${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver > > >>>> > > >>>> Then jps again to make sure HRegionServer doesn't exist. If the > above > > >>>> command doesn't work, you can use Unix "kill" command. Then ssh to > M1, > > >>>> run stop-hbase.sh to stop the master and ZooKeepers. > > >>>> > > >>>> > > >>>> It's still a mystery you don't have regionserver logs while you have > > >>>> zookeeper logs. Maybe those orphan region servers was the reason? I > > >>>> don't know, but you can give it another try after stopping them. So, > > >>>> try to stop whole HBase / ZooKeeper process by above way, then run > > >>>> start-hbase.sh once again. If you can get the regionserver log to > us, > > >>>> that would be great. > > >>>> > > >>>> Thanks, > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >
