This particular problem is fixed in the current 0.20 branch and we just released a candidate for 0.20.2, you can get it here http://people.apache.org/~jdcryans/hbase-0.20.2-candidate-1/
J-D On Tue, Nov 10, 2009 at 5:43 PM, Jeff Zhang <[email protected]> wrote: > The following is the region server's log : > > > 2009-11-10 18:09:08,062 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 3 on 60020: starting > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 4 on 60020: starting > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 5 on 60020: starting > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 6 on 60020: starting > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 7 on 60020: starting > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 8 on 60020: starting > 2009-11-10 18:09:08,063 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started > at: 10.148.224.11:60020 > 2009-11-10 18:09:08,064 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 9 on 60020: starting > 2009-11-10 18:09:08,070 INFO org.apache.hadoop.hbase.regionserver.StoreFile: > Allocating LruBlockCache with maximum size 198.3m > 2009-11-10 18:09:08,095 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP > 2009-11-10 18:09:08,229 INFO org.apache.hadoop.hbase.regionserver.HLog: HLog > configuration: blocksize=67108864, rollsize=63753420, enabled=true, > flushlogentries=100, optionallogflushinternal=10000ms > 2009-11-10 18:09:08,253 INFO org.apache.hadoop.hbase.regionserver.HLog: New > hlog /hbase/.logs/10.148.224.11,60020,1257847748205/hlog.dat.1257847748229 > 2009-11-10 18:09:08,255 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at > 10.148.224.13:60000 that we are up > 2009-11-10 18:09:08,302 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception. > Aborting... > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459) > at java.lang.Thread.run(Thread.java:619) > 2009-11-10 18:09:08,304 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=0.0, regions=0, stores=0, storefiles=0, storefileIndexSize=0, > memstoreSize=0, usedHeap=31, maxHeap=99 > 1, blockCacheSize=1707288, blockCacheFree=206264664, blockCacheCount=0, > blockCacheHitRatio=0 > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > server on 60020 > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 0 on 60020: exiting > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC > Server listener on 60020 > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 1 on 60020: exiting > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 2 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 3 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 4 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 5 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 6 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 7 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 8 on 60020: exiting > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server > handler 9 on 60020: exiting > 2009-11-10 18:09:08,306 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer > 2009-11-10 18:09:08,307 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC > Server Responder > 2009-11-10 18:09:08,412 INFO > org.apache.hadoop.hbase.regionserver.MemStoreFlusher: > regionserver/127.0.0.1:60020.cacheFlusher exiting > 2009-11-10 18:09:08,412 INFO > org.apache.hadoop.hbase.regionserver.LogFlusher: > regionserver/127.0.0.1:60020.logFlusher exiting > 2009-11-10 18:09:08,412 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > regionserver/127.0.0.1:60020.compactor exiting > 2009-11-10 18:09:08,412 INFO org.apache.hadoop.hbase.regionserver.LogRoller: > LogRoller exiting. > 2009-11-10 18:09:08,413 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: > regionserver/127.0.0.1:60020.majorCompactionChecker exiting > 2009-11-10 18:09:08,427 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog > 2009-11-10 18:09:08,428 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at: > 10.148.224.11:60020 > 2009-11-10 18:09:17,489 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting > 2009-11-10 18:09:17,489 INFO org.apache.zookeeper.ZooKeeper: Closing > session: 0x324dcceb05c0003 > 2009-11-10 18:09:17,490 INFO org.apache.zookeeper.ClientCnxn: Closing > ClientCnxn for session: 0x324dcceb05c0003 > 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases: > regionserver/127.0.0.1:60020.leaseChecker closing leases > 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases: > regionserver/127.0.0.1:60020.leaseChecker closed leases > 2009-11-10 18:09:17,500 INFO org.apache.zookeeper.ClientCnxn: Exception > while closing send thread for session 0x324dcceb05c0003 : Read error rc = -1 > java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ClientCnxn: Disconnecting > ClientCnxn for session: 0x324dcceb05c0003 > 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x324dcceb05c0003 closed > 2009-11-10 18:09:17,605 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/ > 127.0.0.1:60020 exiting > 2009-11-10 18:09:17,605 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > 2009-11-10 18:09:17,606 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown > thread. > 2009-11-10 18:09:17,606 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete > > On Tue, Nov 10, 2009 at 10:55 PM, Andrew Purtell <[email protected]>wrote: > >> When you try to start the region servers, what do you see in the log? >> >> If you don't change the client port (hbase.zookeeper.property.clientPort), >> does it work? >> >> - Andy >> >> >> >> >> >> ________________________________ >> From: Jeff Zhang <[email protected]> >> To: [email protected] >> Sent: Tue, November 10, 2009 2:40:28 PM >> Subject: Re: HBase 0.20.1 Distributed Install Problems >> >> Hi, >> >> I meet the same problem that I can not start the regionserver. >> >> When I invoke zk_dump >> >> it shows: >> >> HBase tree in ZooKeeper is rooted at /hbase >> Cluster up? true >> In safe mode? true >> Master address: 10.148.224.13:60000 >> Region server holding ROOT: null >> Region servers: >> >> >> The following is my hbase-site.xml >> >> <configuration> >> <property> >> <name>hbase.cluster.distributed</name> >> <value>true</value> >> <description>The mode the cluster will be in. Possible values are >> false: standalone and pseudo-distributed setups with managed Zookeeper >> true: fully-distributed with unmanaged Zookeeper Quorum (see >> hbase-env.sh) >> </description> >> </property> >> <property> >> <name>hbase.rootdir</name> >> <value>hdfs://sha-cs-04:9000/hbase</value> >> <description>The directory shared by region servers. >> </description> >> </property> >> <property> >> <name>hbase.zookeeper.property.clientPort</name> >> <value>2222</value> >> <description>Property from ZooKeeper's config zoo.cfg. >> The port at which the clients will connect. >> </description> >> </property> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-05,sha-cs-06</value> >> <description>Comma separated list of servers in the ZooKeeper Quorum. >> For example, "host1.mydomain.com,host2.mydomain.com, >> host3.mydomain.com >> ". >> By default this is set to localhost for local and pseudo-distributed >> modes >> of operation. For a fully-distributed setup, this should be set to a >> full >> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in >> hbase-env.sh >> this is the list of servers which we will start/stop ZooKeeper on. >> </description> >> </property> >> >> </configuration> >> >> What's wrong with my configuration ? >> >> >> Thank you in advance. >> >> >> Jeff Zhang >> >> >> >> On Tue, Nov 10, 2009 at 12:47 PM, Tatsuya Kawano >> <[email protected]>wrote: >> >> > Hello, >> > >> > It looks like the master and the region servers are cannot locate each >> > other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster >> > members, so maybe your zk has wrong information. >> > >> > Can you type zk_dump from hbase shell and let us the result? >> > >> > If the cluster is properly configured, you'll get something like this: >> > ===================================== >> > hbase(main):007:0> zk_dump >> > >> > HBase tree in ZooKeeper is rooted at /hbase >> > Cluster up? true >> > In safe mode? false >> > Master address: 172.16.80.26:60000 >> > Region server holding ROOT: 172.16.80.27:60020 >> > Region servers: >> > - 172.16.80.27:60020 >> > - 172.16.80.29:60020 >> > - 172.16.80.28:60020 >> > ===================================== >> > >> > >> > > one of my co-workers apparently can log into his box and submit jobs, >> but >> > > me or anyone else is still unable to log in. >> > >> > Maybe you're a bit confused; your co-worker seems to be able to use >> > Hadoop Map/Reduce, not HBase. >> > >> > >> > > Does Hbase allow concurrent connections? >> > >> > Yes. >> > >> > >> > >> I think it also says the master is on port 60000 >> > >> when the install directions say its supposed to be 60010? >> > >> > Port 60000 is correct. The master uses port 60000 to accept connection >> > from hbase shell and region servers. Port 60010 is for the web-based >> > HBase console. >> > >> > >> > > We tried applying this fix (to explicitly set the master): >> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >> > >> > No, this is an old way to configure a cluster. You shouldn't use this >> > with HBase 0.20.x >> > >> > >> > Thanks, >> > >> > -- >> > Tatsuya Kawano (Mr.) >> > Tokyo, Japan >> > >> > >> > >> > On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates >> > <[email protected]> wrote: >> > > Another interesting data point. We tried applying this fix (to >> > explicitly >> > > set the master): >> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >> > > >> > > But when I log in to the master node, it takes really long to submit a >> > query >> > > and I get this in response: >> > > hbase(main):001:0> list >> > > NativeException: >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: >> > > Trying to contact region server null for region , row '', but failed >> > after 5 >> > > attempts. >> > > Exceptions: >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> > trying >> > > to locate root region >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> > trying >> > > to locate root region >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> > trying >> > > to locate root region >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> > trying >> > > to locate root region >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> > trying >> > > to locate root region >> > > >> > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in >> > > `getRegionServerWithRetries' >> > > from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' >> > > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' >> > > from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in >> > > `listTables' >> > > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' >> > > from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' >> > > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' >> > > from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' >> > > from java/lang/reflect/Method.java:597:in `invoke' >> > > from org/jruby/javasupport/JavaMethod.java:298:in >> > > `invokeWithExceptionHandling' >> > > from org/jruby/javasupport/JavaMethod.java:259:in `invoke' >> > > from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' >> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> > `cacheAndCall' >> > > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> > > from org/jruby/ast/CallNoArgNode.java:61:in `interpret' >> > > from org/jruby/ast/ForNode.java:104:in `interpret' >> > > ... 116 levels... >> > > from >> > > >> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in >> > > `call' >> > > from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in >> `call' >> > > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in >> `call' >> > > from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in >> `call' >> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> > `cacheAndCall' >> > > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> > > from >> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in >> > > `__file__' >> > > from >> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in >> > > `load' >> > > from org/jruby/Ruby.java:577:in `runScript' >> > > from org/jruby/Ruby.java:480:in `runNormally' >> > > from org/jruby/Ruby.java:354:in `runFromMain' >> > > from org/jruby/Main.java:229:in `run' >> > > from org/jruby/Main.java:110:in `run' >> > > from org/jruby/Main.java:94:in `main' >> > > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list' >> > > from (hbase):2hbase(main):002:0> >> > > >> > > >> > > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates < >> > > [email protected]> wrote: >> > > >> > >> thanks for your response Sujee. These boxes are all on an internal >> DNS >> > and >> > >> they all resolve. >> > >> >> > >> one of my co-workers apparently can log into his box and submit jobs, >> > but >> > >> me or anyone else is still unable to log in. Does Hbase allow >> > concurrent >> > >> connections? In Hive I remember having to configure the metastore to >> be >> > in >> > >> server mode if multiple people were using it. >> > >> >> > >> >> > >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]> >> wrote: >> > >> >> > >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh >> > >>> > >> > >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of >> > known >> > >>> > hosts. >> > >>> >> > >>> >> > >>> is your SSH setup correctly? From master, you need to be able to >> > >>> login to all slaves/regionservers without password >> > >>> >> > >>> And I see you are using short hostnames (crunch2, crunch3), do they >> > >>> all resolve correctly? or you need to update /etc/hosts to resolve >> > >>> these to an IP address on all machines. >> > >>> >> > >>> regards >> > >>> Sujee Maniyam >> > >>> -- >> > >>> http://sujee.net >> > >>> >> > >> >> > >> >> > > >> > >> >> >> >> >> >
