Re: HBase 0.20.1 Distributed Install Problems

Jean-Daniel Cryans Tue, 10 Nov 2009 17:48:03 -0800

This particular problem is fixed in the current 0.20 branch and we
just released a candidate for 0.20.2, you can get it here
http://people.apache.org/~jdcryans/hbase-0.20.2-candidate-1/


J-D

On Tue, Nov 10, 2009 at 5:43 PM, Jeff Zhang <[email protected]> wrote:
> The following is the region server's log :
>
>
> 2009-11-10 18:09:08,062 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60020: starting
> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60020: starting
> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 5 on 60020: starting
> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60020: starting
> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60020: starting
> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60020: starting
> 2009-11-10 18:09:08,063 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started
> at: 10.148.224.11:60020
> 2009-11-10 18:09:08,064 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60020: starting
> 2009-11-10 18:09:08,070 INFO org.apache.hadoop.hbase.regionserver.StoreFile:
> Allocating LruBlockCache with maximum size 198.3m
> 2009-11-10 18:09:08,095 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_CALL_SERVER_STARTUP
> 2009-11-10 18:09:08,229 INFO org.apache.hadoop.hbase.regionserver.HLog: HLog
> configuration: blocksize=67108864, rollsize=63753420, enabled=true,
> flushlogentries=100, optionallogflushinternal=10000ms
> 2009-11-10 18:09:08,253 INFO org.apache.hadoop.hbase.regionserver.HLog: New
> hlog /hbase/.logs/10.148.224.11,60020,1257847748205/hlog.dat.1257847748229
> 2009-11-10 18:09:08,255 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
> 10.148.224.13:60000 that we are up
> 2009-11-10 18:09:08,302 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
> Aborting...
> java.lang.NullPointerException
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459)
>        at java.lang.Thread.run(Thread.java:619)
> 2009-11-10 18:09:08,304 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> request=0.0, regions=0, stores=0, storefiles=0, storefileIndexSize=0,
> memstoreSize=0, usedHeap=31, maxHeap=99
> 1, blockCacheSize=1707288, blockCacheFree=206264664, blockCacheCount=0,
> blockCacheHitRatio=0
> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020: exiting
> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
> Server listener on 60020
> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60020: exiting
> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 5 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60020: exiting
> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60020: exiting
> 2009-11-10 18:09:08,306 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping infoServer
> 2009-11-10 18:09:08,307 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
> Server Responder
> 2009-11-10 18:09:08,412 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver/127.0.0.1:60020.cacheFlusher exiting
> 2009-11-10 18:09:08,412 INFO
> org.apache.hadoop.hbase.regionserver.LogFlusher:
> regionserver/127.0.0.1:60020.logFlusher exiting
> 2009-11-10 18:09:08,412 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver/127.0.0.1:60020.compactor exiting
> 2009-11-10 18:09:08,412 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
> LogRoller exiting.
> 2009-11-10 18:09:08,413 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
> regionserver/127.0.0.1:60020.majorCompactionChecker exiting
> 2009-11-10 18:09:08,427 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, closed hlog
> 2009-11-10 18:09:08,428 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting server at:
> 10.148.224.11:60020
> 2009-11-10 18:09:17,489 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread exiting
> 2009-11-10 18:09:17,489 INFO org.apache.zookeeper.ZooKeeper: Closing
> session: 0x324dcceb05c0003
> 2009-11-10 18:09:17,490 INFO org.apache.zookeeper.ClientCnxn: Closing
> ClientCnxn for session: 0x324dcceb05c0003
> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
> regionserver/127.0.0.1:60020.leaseChecker closing leases
> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
> regionserver/127.0.0.1:60020.leaseChecker closed leases
> 2009-11-10 18:09:17,500 INFO org.apache.zookeeper.ClientCnxn: Exception
> while closing send thread for session 0x324dcceb05c0003 : Read error rc = -1
> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
> ClientCnxn for session: 0x324dcceb05c0003
> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x324dcceb05c0003 closed
> 2009-11-10 18:09:17,605 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
> 127.0.0.1:60020 exiting
> 2009-11-10 18:09:17,605 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2009-11-10 18:09:17,606 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting shutdown
> thread.
> 2009-11-10 18:09:17,606 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown thread complete
>
> On Tue, Nov 10, 2009 at 10:55 PM, Andrew Purtell <[email protected]>wrote:
>
>> When you try to start the region servers, what do you see in the log?
>>
>> If you don't change the client port (hbase.zookeeper.property.clientPort),
>> does it work?
>>
>>     - Andy
>>
>>
>>
>>
>>
>> ________________________________
>> From: Jeff Zhang <[email protected]>
>> To: [email protected]
>> Sent: Tue, November 10, 2009 2:40:28 PM
>> Subject: Re: HBase 0.20.1 Distributed Install Problems
>>
>> Hi,
>>
>> I meet the same problem that I can not start the regionserver.
>>
>> When I invoke zk_dump
>>
>> it shows:
>>
>> HBase tree in ZooKeeper is rooted at /hbase
>>  Cluster up? true
>>  In safe mode? true
>>  Master address: 10.148.224.13:60000
>>  Region server holding ROOT: null
>>  Region servers:
>>
>>
>> The following is my hbase-site.xml
>>
>> <configuration>
>>  <property>
>>    <name>hbase.cluster.distributed</name>
>>    <value>true</value>
>>    <description>The mode the cluster will be in. Possible values are
>>      false: standalone and pseudo-distributed setups with managed Zookeeper
>>      true: fully-distributed with unmanaged Zookeeper Quorum (see
>> hbase-env.sh)
>>    </description>
>>  </property>
>>  <property>
>>    <name>hbase.rootdir</name>
>>    <value>hdfs://sha-cs-04:9000/hbase</value>
>>    <description>The directory shared by region servers.
>>    </description>
>>  </property>
>>  <property>
>>      <name>hbase.zookeeper.property.clientPort</name>
>>      <value>2222</value>
>>      <description>Property from ZooKeeper's config zoo.cfg.
>>      The port at which the clients will connect.
>>      </description>
>>   </property>
>>   <property>
>>      <name>hbase.zookeeper.quorum</name>
>>      <value>sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-05,sha-cs-06</value>
>>      <description>Comma separated list of servers in the ZooKeeper Quorum.
>>      For example, "host1.mydomain.com,host2.mydomain.com,
>> host3.mydomain.com
>> ".
>>      By default this is set to localhost for local and pseudo-distributed
>> modes
>>      of operation. For a fully-distributed setup, this should be set to a
>> full
>>      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>> hbase-env.sh
>>      this is the list of servers which we will start/stop ZooKeeper on.
>>      </description>
>>    </property>
>>
>> </configuration>
>>
>> What's wrong with my configuration ?
>>
>>
>> Thank you in advance.
>>
>>
>> Jeff Zhang
>>
>>
>>
>> On Tue, Nov 10, 2009 at 12:47 PM, Tatsuya Kawano
>> <[email protected]>wrote:
>>
>> > Hello,
>> >
>> > It looks like the master and the region servers are cannot locate each
>> > other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster
>> > members, so maybe your zk has wrong information.
>> >
>> > Can you type zk_dump from hbase shell and let us the result?
>> >
>> > If the cluster is properly configured, you'll get something like this:
>> > =====================================
>> > hbase(main):007:0> zk_dump
>> >
>> > HBase tree in ZooKeeper is rooted at /hbase
>> >  Cluster up? true
>> >  In safe mode? false
>> >  Master address: 172.16.80.26:60000
>> >  Region server holding ROOT: 172.16.80.27:60020
>> >  Region servers:
>> >   - 172.16.80.27:60020
>> >   - 172.16.80.29:60020
>> >   - 172.16.80.28:60020
>> > =====================================
>> >
>> >
>> > > one of my co-workers apparently can log into his box and submit jobs,
>> but
>> > > me or anyone else is still unable to log in.
>> >
>> > Maybe you're a bit confused; your co-worker seems to be able to use
>> > Hadoop Map/Reduce, not HBase.
>> >
>> >
>> > > Does Hbase allow concurrent connections?
>> >
>> > Yes.
>> >
>> >
>> > >> I think it also says the master is on port 60000
>> > >> when the install directions say its supposed to be 60010?
>> >
>> > Port 60000 is correct. The master uses port 60000 to accept connection
>> > from hbase shell and region servers. Port 60010 is for the web-based
>> > HBase console.
>> >
>> >
>> > > We tried applying this fix (to explicitly set the master):
>> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>> >
>> > No, this is an old way to configure a cluster. You shouldn't use this
>> > with HBase 0.20.x
>> >
>> >
>> > Thanks,
>> >
>> > --
>> > Tatsuya Kawano (Mr.)
>> > Tokyo, Japan
>> >
>> >
>> >
>> > On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates
>> > <[email protected]> wrote:
>> > > Another interesting data point.  We tried applying this fix (to
>> > explicitly
>> > > set the master):
>> > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>> > >
>> > > But when I log in to the master node, it takes really long to submit a
>> > query
>> > > and I get this in response:
>> > > hbase(main):001:0> list
>> > > NativeException:
>> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > > Trying to contact region server null for region , row '', but failed
>> > after 5
>> > > attempts.
>> > > Exceptions:
>> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying
>> > > to locate root region
>> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying
>> > > to locate root region
>> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying
>> > > to locate root region
>> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying
>> > > to locate root region
>> > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> > trying
>> > > to locate root region
>> > >
>> > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> > > `getRegionServerWithRetries'
>> > >  from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>> > > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>> > >  from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> > > `listTables'
>> > > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>> > >  from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>> > > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>> > >  from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>> > > from java/lang/reflect/Method.java:597:in `invoke'
>> > >  from org/jruby/javasupport/JavaMethod.java:298:in
>> > > `invokeWithExceptionHandling'
>> > > from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>> > >  from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> > `cacheAndCall'
>> > >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> > > from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>> > >  from org/jruby/ast/ForNode.java:104:in `interpret'
>> > > ... 116 levels...
>> > > from
>> > >
>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in
>> > > `call'
>> > >  from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>> `call'
>> > > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>> `call'
>> > >  from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>> `call'
>> > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> > `cacheAndCall'
>> > >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> > > from
>> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in
>> > > `__file__'
>> > >  from
>> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in
>> > > `load'
>> > > from org/jruby/Ruby.java:577:in `runScript'
>> > >  from org/jruby/Ruby.java:480:in `runNormally'
>> > > from org/jruby/Ruby.java:354:in `runFromMain'
>> > >  from org/jruby/Main.java:229:in `run'
>> > > from org/jruby/Main.java:110:in `run'
>> > >  from org/jruby/Main.java:94:in `main'
>> > > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list'
>> > >  from (hbase):2hbase(main):002:0>
>> > >
>> > >
>> > > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates <
>> > > [email protected]> wrote:
>> > >
>> > >> thanks for your response Sujee.  These boxes are all on an internal
>> DNS
>> > and
>> > >> they all resolve.
>> > >>
>> > >> one of my co-workers apparently can log into his box and submit jobs,
>> > but
>> > >> me or anyone else is still unable to log in.  Does Hbase allow
>> > concurrent
>> > >> connections?  In Hive I remember having to configure the metastore to
>> be
>> > in
>> > >> server mode if multiple people were using it.
>> > >>
>> > >>
>> > >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]>
>> wrote:
>> > >>
>> > >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh
>> > >>> >
>> > >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of
>> > known
>> > >>> > hosts.
>> > >>>
>> > >>>
>> > >>> is your SSH setup correctly?  From master, you need to be able to
>> > >>> login to all slaves/regionservers without password
>> > >>>
>> > >>> And I see you are using short hostnames (crunch2, crunch3), do they
>> > >>> all resolve correctly?  or you need to update /etc/hosts to resolve
>> > >>> these to an IP address on all machines.
>> > >>>
>> > >>> regards
>> > >>> Sujee Maniyam
>> > >>> --
>> > >>> http://sujee.net
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>>
>>
>>
>>
>

Re: HBase 0.20.1 Distributed Install Problems

Reply via email to