Re: HBase 0.20.1 Distributed Install Problems

Jean-Daniel Cryans Tue, 10 Nov 2009 20:57:22 -0800

Please read my answer to Chris (wrote about 10-15 minutes ago), you
also seem to confuse regionservers and zookeeper quorum members.


In this case it also seems some region servers registered themselves
as localhost and then with their good address the master probably gave
them. Please check your OS network configurations and make sure the
hostname points at the right place.

J-D

On Tue, Nov 10, 2009 at 8:47 PM, Jeff Zhang <[email protected]> wrote:
> Hi Jean,
>
> I try the hbase 0.20.2, I look the logs, it seems the master the regions
> works.
>
> But I can not run list command on hbase shell. When I invoke command status
> 'simple' on hbase shell. It shows the following message:
> 09/11/11 12:42:55 DEBUG client.HConnectionManager$ClientZKWatcher: Got
> ZooKeeper event, state: SyncConnected, type: None, path: null
> 09/11/11 12:42:55 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master
> got 10.148.224.13:60000
> 8 servers, 0 dead, 0.1250 average load
> hbase(main):002:0> status 'simple'
> 8 live servers
>    localhost:60020 1257914319445
>        requests=0, regions=0, usedHeap=0, maxHeap=0
>    sha-cs-03:60020 1257914321331
>        requests=0, regions=0, usedHeap=33, maxHeap=991
>    localhost:60020 1257914320265
>        requests=0, regions=0, usedHeap=0, maxHeap=0
>    sha-cs-01:60020 1257914320551
>        requests=0, regions=1, usedHeap=34, maxHeap=991
>    sha-cs-05:60020 1257914322656
>        requests=0, regions=0, usedHeap=33, maxHeap=991
>    sha-cs-06:60020 1257914321467
>        requests=0, regions=0, usedHeap=34, maxHeap=991
>    localhost:60020 1257914320202
>        requests=0, regions=0, usedHeap=0, maxHeap=0
>    localhost:60020 1257914321532
>        requests=0, regions=0, usedHeap=0, maxHeap=0
>
>
> It's weired that why here I have 3 localhost zookeeper, actually I set 5
> machines on hbase.zookeeper.quorum
>
>
>
> Jeff Zhang
>
>
>
>
> On Wed, Nov 11, 2009 at 9:47 AM, Jean-Daniel Cryans 
> <[email protected]>wrote:
>
>> This particular problem is fixed in the current 0.20 branch and we
>> just released a candidate for 0.20.2, you can get it here
>> http://people.apache.org/~jdcryans/hbase-0.20.2-candidate-1/<http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/>
>>
>> J-D
>>
>> On Tue, Nov 10, 2009 at 5:43 PM, Jeff Zhang <[email protected]> wrote:
>> > The following is the region server's log :
>> >
>> >
>> > 2009-11-10 18:09:08,062 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 3 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 4 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 5 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 6 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 7 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 8 on 60020: starting
>> > 2009-11-10 18:09:08,063 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
>> > at: 10.148.224.11:60020
>> > 2009-11-10 18:09:08,064 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 9 on 60020: starting
>> > 2009-11-10 18:09:08,070 INFO
>> org.apache.hadoop.hbase.regionserver.StoreFile:
>> > Allocating LruBlockCache with maximum size 198.3m
>> > 2009-11-10 18:09:08,095 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
>> MSG_CALL_SERVER_STARTUP
>> > 2009-11-10 18:09:08,229 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> HLog
>> > configuration: blocksize=67108864, rollsize=63753420, enabled=true,
>> > flushlogentries=100, optionallogflushinternal=10000ms
>> > 2009-11-10 18:09:08,253 INFO org.apache.hadoop.hbase.regionserver.HLog:
>> New
>> > hlog /hbase/.logs/10.148.224.11
>> ,60020,1257847748205/hlog.dat.1257847748229
>> > 2009-11-10 18:09:08,255 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
>> > 10.148.224.13:60000 that we are up
>> > 2009-11-10 18:09:08,302 FATAL
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
>> > Aborting...
>> > java.lang.NullPointerException
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459)
>> >        at java.lang.Thread.run(Thread.java:619)
>> > 2009-11-10 18:09:08,304 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
>> > request=0.0, regions=0, stores=0, storefiles=0, storefileIndexSize=0,
>> > memstoreSize=0, usedHeap=31, maxHeap=99
>> > 1, blockCacheSize=1707288, blockCacheFree=206264664, blockCacheCount=0,
>> > blockCacheHitRatio=0
>> > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
>> > server on 60020
>> > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 0 on 60020: exiting
>> > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
>> IPC
>> > Server listener on 60020
>> > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 1 on 60020: exiting
>> > 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 2 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 3 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 4 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 5 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 6 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 7 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 8 on 60020: exiting
>> > 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 9 on 60020: exiting
>> > 2009-11-10 18:09:08,306 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
>> > 2009-11-10 18:09:08,307 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
>> IPC
>> > Server Responder
>> > 2009-11-10 18:09:08,412 INFO
>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
>> > regionserver/127.0.0.1:60020.cacheFlusher exiting
>> > 2009-11-10 18:09:08,412 INFO
>> > org.apache.hadoop.hbase.regionserver.LogFlusher:
>> > regionserver/127.0.0.1:60020.logFlusher exiting
>> > 2009-11-10 18:09:08,412 INFO
>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>> > regionserver/127.0.0.1:60020.compactor exiting
>> > 2009-11-10 18:09:08,412 INFO
>> org.apache.hadoop.hbase.regionserver.LogRoller:
>> > LogRoller exiting.
>> > 2009-11-10 18:09:08,413 INFO
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
>> > regionserver/127.0.0.1:60020.majorCompactionChecker exiting
>> > 2009-11-10 18:09:08,427 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
>> > 2009-11-10 18:09:08,428 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
>> > 10.148.224.11:60020
>> > 2009-11-10 18:09:17,489 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
>> > 2009-11-10 18:09:17,489 INFO org.apache.zookeeper.ZooKeeper: Closing
>> > session: 0x324dcceb05c0003
>> > 2009-11-10 18:09:17,490 INFO org.apache.zookeeper.ClientCnxn: Closing
>> > ClientCnxn for session: 0x324dcceb05c0003
>> > 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
>> > regionserver/127.0.0.1:60020.leaseChecker closing leases
>> > 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
>> > regionserver/127.0.0.1:60020.leaseChecker closed leases
>> > 2009-11-10 18:09:17,500 INFO org.apache.zookeeper.ClientCnxn: Exception
>> > while closing send thread for session 0x324dcceb05c0003 : Read error rc =
>> -1
>> > java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>> > 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ClientCnxn:
>> Disconnecting
>> > ClientCnxn for session: 0x324dcceb05c0003
>> > 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ZooKeeper: Session:
>> > 0x324dcceb05c0003 closed
>> > 2009-11-10 18:09:17,605 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>> > 127.0.0.1:60020 exiting
>> > 2009-11-10 18:09:17,605 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> > shut down
>> > 2009-11-10 18:09:17,606 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
>> > thread.
>> > 2009-11-10 18:09:17,606 INFO
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread
>> complete
>> >
>> > On Tue, Nov 10, 2009 at 10:55 PM, Andrew Purtell <[email protected]
>> >wrote:
>> >
>> >> When you try to start the region servers, what do you see in the log?
>> >>
>> >> If you don't change the client port
>> (hbase.zookeeper.property.clientPort),
>> >> does it work?
>> >>
>> >>     - Andy
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ________________________________
>> >> From: Jeff Zhang <[email protected]>
>> >> To: [email protected]
>> >> Sent: Tue, November 10, 2009 2:40:28 PM
>> >> Subject: Re: HBase 0.20.1 Distributed Install Problems
>> >>
>> >> Hi,
>> >>
>> >> I meet the same problem that I can not start the regionserver.
>> >>
>> >> When I invoke zk_dump
>> >>
>> >> it shows:
>> >>
>> >> HBase tree in ZooKeeper is rooted at /hbase
>> >>  Cluster up? true
>> >>  In safe mode? true
>> >>  Master address: 10.148.224.13:60000
>> >>  Region server holding ROOT: null
>> >>  Region servers:
>> >>
>> >>
>> >> The following is my hbase-site.xml
>> >>
>> >> <configuration>
>> >>  <property>
>> >>    <name>hbase.cluster.distributed</name>
>> >>    <value>true</value>
>> >>    <description>The mode the cluster will be in. Possible values are
>> >>      false: standalone and pseudo-distributed setups with managed
>> Zookeeper
>> >>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>> >> hbase-env.sh)
>> >>    </description>
>> >>  </property>
>> >>  <property>
>> >>    <name>hbase.rootdir</name>
>> >>    <value>hdfs://sha-cs-04:9000/hbase</value>
>> >>    <description>The directory shared by region servers.
>> >>    </description>
>> >>  </property>
>> >>  <property>
>> >>      <name>hbase.zookeeper.property.clientPort</name>
>> >>      <value>2222</value>
>> >>      <description>Property from ZooKeeper's config zoo.cfg.
>> >>      The port at which the clients will connect.
>> >>      </description>
>> >>   </property>
>> >>   <property>
>> >>      <name>hbase.zookeeper.quorum</name>
>> >>      <value>sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-05,sha-cs-06</value>
>> >>      <description>Comma separated list of servers in the ZooKeeper
>> Quorum.
>> >>      For example, "host1.mydomain.com,host2.mydomain.com,
>> >> host3.mydomain.com
>> >> ".
>> >>      By default this is set to localhost for local and
>> pseudo-distributed
>> >> modes
>> >>      of operation. For a fully-distributed setup, this should be set to
>> a
>> >> full
>> >>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>> >> hbase-env.sh
>> >>      this is the list of servers which we will start/stop ZooKeeper on.
>> >>      </description>
>> >>    </property>
>> >>
>> >> </configuration>
>> >>
>> >> What's wrong with my configuration ?
>> >>
>> >>
>> >> Thank you in advance.
>> >>
>> >>
>> >> Jeff Zhang
>> >>
>> >>
>> >>
>> >> On Tue, Nov 10, 2009 at 12:47 PM, Tatsuya Kawano
>> >> <[email protected]>wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > It looks like the master and the region servers are cannot locate each
>> >> > other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster
>> >> > members, so maybe your zk has wrong information.
>> >> >
>> >> > Can you type zk_dump from hbase shell and let us the result?
>> >> >
>> >> > If the cluster is properly configured, you'll get something like this:
>> >> > =====================================
>> >> > hbase(main):007:0> zk_dump
>> >> >
>> >> > HBase tree in ZooKeeper is rooted at /hbase
>> >> >  Cluster up? true
>> >> >  In safe mode? false
>> >> >  Master address: 172.16.80.26:60000
>> >> >  Region server holding ROOT: 172.16.80.27:60020
>> >> >  Region servers:
>> >> >   - 172.16.80.27:60020
>> >> >   - 172.16.80.29:60020
>> >> >   - 172.16.80.28:60020
>> >> > =====================================
>> >> >
>> >> >
>> >> > > one of my co-workers apparently can log into his box and submit
>> jobs,
>> >> but
>> >> > > me or anyone else is still unable to log in.
>> >> >
>> >> > Maybe you're a bit confused; your co-worker seems to be able to use
>> >> > Hadoop Map/Reduce, not HBase.
>> >> >
>> >> >
>> >> > > Does Hbase allow concurrent connections?
>> >> >
>> >> > Yes.
>> >> >
>> >> >
>> >> > >> I think it also says the master is on port 60000
>> >> > >> when the install directions say its supposed to be 60010?
>> >> >
>> >> > Port 60000 is correct. The master uses port 60000 to accept connection
>> >> > from hbase shell and region servers. Port 60010 is for the web-based
>> >> > HBase console.
>> >> >
>> >> >
>> >> > > We tried applying this fix (to explicitly set the master):
>> >> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>> >> >
>> >> > No, this is an old way to configure a cluster. You shouldn't use this
>> >> > with HBase 0.20.x
>> >> >
>> >> >
>> >> > Thanks,
>> >> >
>> >> > --
>> >> > Tatsuya Kawano (Mr.)
>> >> > Tokyo, Japan
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates
>> >> > <[email protected]> wrote:
>> >> > > Another interesting data point.  We tried applying this fix (to
>> >> > explicitly
>> >> > > set the master):
>> >> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>> >> > >
>> >> > > But when I log in to the master node, it takes really long to submit
>> a
>> >> > query
>> >> > > and I get this in response:
>> >> > > hbase(main):001:0> list
>> >> > > NativeException:
>> >> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> >> > > Trying to contact region server null for region , row '', but failed
>> >> > after 5
>> >> > > attempts.
>> >> > > Exceptions:
>> >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> >> > trying
>> >> > > to locate root region
>> >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> >> > trying
>> >> > > to locate root region
>> >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> >> > trying
>> >> > > to locate root region
>> >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> >> > trying
>> >> > > to locate root region
>> >> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> >> > trying
>> >> > > to locate root region
>> >> > >
>> >> > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> >> > > `getRegionServerWithRetries'
>> >> > >  from org/apache/hadoop/hbase/client/MetaScanner.java:55:in
>> `metaScan'
>> >> > > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in
>> `metaScan'
>> >> > >  from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> >> > > `listTables'
>> >> > > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in
>> `listTables'
>> >> > >  from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>> >> > > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>> >> > >  from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>> >> > > from java/lang/reflect/Method.java:597:in `invoke'
>> >> > >  from org/jruby/javasupport/JavaMethod.java:298:in
>> >> > > `invokeWithExceptionHandling'
>> >> > > from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>> >> > >  from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in
>> `call'
>> >> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> >> > `cacheAndCall'
>> >> > >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> >> > > from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>> >> > >  from org/jruby/ast/ForNode.java:104:in `interpret'
>> >> > > ... 116 levels...
>> >> > > from
>> >> > >
>> >>
>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in
>> >> > > `call'
>> >> > >  from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> >> `call'
>> >> > > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>> >> `call'
>> >> > >  from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> >> `call'
>> >> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> >> > `cacheAndCall'
>> >> > >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> >> > > from
>> >> >
>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in
>> >> > > `__file__'
>> >> > >  from
>> >> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in
>> >> > > `load'
>> >> > > from org/jruby/Ruby.java:577:in `runScript'
>> >> > >  from org/jruby/Ruby.java:480:in `runNormally'
>> >> > > from org/jruby/Ruby.java:354:in `runFromMain'
>> >> > >  from org/jruby/Main.java:229:in `run'
>> >> > > from org/jruby/Main.java:110:in `run'
>> >> > >  from org/jruby/Main.java:94:in `main'
>> >> > > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list'
>> >> > >  from (hbase):2hbase(main):002:0>
>> >> > >
>> >> > >
>> >> > > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates <
>> >> > > [email protected]> wrote:
>> >> > >
>> >> > >> thanks for your response Sujee.  These boxes are all on an internal
>> >> DNS
>> >> > and
>> >> > >> they all resolve.
>> >> > >>
>> >> > >> one of my co-workers apparently can log into his box and submit
>> jobs,
>> >> > but
>> >> > >> me or anyone else is still unable to log in.  Does Hbase allow
>> >> > concurrent
>> >> > >> connections?  In Hive I remember having to configure the metastore
>> to
>> >> be
>> >> > in
>> >> > >> server mode if multiple people were using it.
>> >> > >>
>> >> > >>
>> >> > >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]>
>> >> wrote:
>> >> > >>
>> >> > >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh
>> >> > >>> >
>> >> > >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list
>> of
>> >> > known
>> >> > >>> > hosts.
>> >> > >>>
>> >> > >>>
>> >> > >>> is your SSH setup correctly?  From master, you need to be able to
>> >> > >>> login to all slaves/regionservers without password
>> >> > >>>
>> >> > >>> And I see you are using short hostnames (crunch2, crunch3), do
>> they
>> >> > >>> all resolve correctly?  or you need to update /etc/hosts to
>> resolve
>> >> > >>> these to an IP address on all machines.
>> >> > >>>
>> >> > >>> regards
>> >> > >>> Sujee Maniyam
>> >> > >>> --
>> >> > >>> http://sujee.net
>> >> > >>>
>> >> > >>
>> >> > >>
>> >> > >
>> >> >
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>

Re: HBase 0.20.1 Distributed Install Problems

Reply via email to