Re: HBase 0.20.1 Distributed Install Problems

Jean-Daniel Cryans Tue, 10 Nov 2009 20:13:03 -0800

Chris,

One very important thing to understand is that the region servers are
one thing and the quorum members another.


The quorum members are part of Zookeeper to provide a highly available
distributed management system that are typically on a number of 3 or 5
nodes. HBase uses it to be highly available. They are listed in
hbase.zookeeper.quorum in the file conf/hbase-site.xml and they listen
on 2181.

The region servers are the HBase "worker" nodes, they do the real
work. You put them on as many machine you have (except 1 for the
master). They are listed in conf/regionservers, one per line. They
listen on 60020.

Finally, the talk that Stack and Jon gave at apachecon is helpful in
understanding issues encountered by new users http://su.pr/28fwSE

Hope this helps,

J-D

On Tue, Nov 10, 2009 at 7:47 PM, Chris Bates
<[email protected]> wrote:
> Thanks everyone for your help.  We discovered a couple things:
>
> 1) Our Master Node was not in the ZK quorum.
> 2) Our hosts file was such that the regionservers were pinging against
> themselves, so we removed this line from our hosts file and made it so they
> had to go to the DNS to resolve their identity.  This is still a little
> unclear to me as one of my co-workers fixed this issue.
>
> We had some other problems, probably do to us messing with the configuration
> files so many times.  So I removed Hbase from all the boxes.  Then I
> followed these instructions
> http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_descriptionas
> stack had suggested.  I then scp'd everything over to the other
> boxes...so ssh was working without password.
>
> The UI works.  I was able to run "list" and "create" at the command shell.
>  One weird thing though is this is my output from zk_dump:
> HBase tree in ZooKeeper is rooted at /hbase
>  Cluster up? true
>  In safe mode? false
>  Master address: 172.16.1.46:60000
>  Region server holding ROOT: 172.16.1.46:60020
>  Region servers:
>    - 172.16.1.46:60020
>
> Which says I only have 1 region server.  When I check the master UI it says
> there are 5 servers in the quorum--but only 1 regionserver.  All the
> regionservers are supposed to be on post 2181 like in the Wiki---if I change
> it to 2222 as someone had mentioned---nothing works.  I also have the same
> regionservers file in the conf directories that have 5 servers.  When I
> check regionserver UI log on 60030 it says this:
>
> 2009-11-10 22:37:31,683 INFO org.apache.zookeeper.ClientCnxn: Server
> connection successful
> 2009-11-10 22:37:31,708 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper
> event, state: SyncConnected, type: None, path: null
> 2009-11-10 22:37:31,860 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at
> 172.16.1.46:60000 that we are up
> 2009-11-10 22:38:03,070 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us
> address to use. Was=172.16.1.46:60020, Now=172.16.1.46
> 2009-11-10 22:38:03,505 INFO
> org.apache.hadoop.hbase.regionserver.HLog: HLog configuration:
> blocksize=67108864, rollsize=63753420, enabled=true,
> flushlogentries=100, optionallogflushinternal=10000ms
> 2009-11-10 22:38:03,727 INFO
> org.apache.hadoop.hbase.regionserver.HLog: New hlog
> /hbase/.logs/chanel2.local,60020,1257910682720/hlog.dat.1257910683505
> 2009-11-10 22:38:03,759 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=RegionServer,
> sessionId=regionserver/172.16.1.46:60020
> 2009-11-10 22:38:03,769 INFO
> org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics:
> Initialized
> 2009-11-10 22:38:04,143 INFO org.apache.hadoop.http.HttpServer: Port
> returned by webServer.getConnectors()[0].getLocalPort() before open()
> is -1. Opening the listener on 60030
> 2009-11-10 22:38:04,144 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 60030
> webServer.getConnectors()[0].getLocalPort() returned 60030
> 2009-11-10 22:38:04,145 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 60030
> 2009-11-10 22:39:12,514 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server Responder: starting
> 2009-11-10 22:39:12,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server listener on 60020: starting
> 2009-11-10 22:39:12,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 0 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 1 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 2 on 60020: starting
> 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 3 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 4 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 5 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 6 on 60020: starting
> 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 7 on 60020: starting
> 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 8 on 60020: starting
> 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> Server handler 9 on 60020: starting
> 2009-11-10 22:39:12,520 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer
> started at: 172.16.1.46:60020
> 2009-11-10 22:39:12,532 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Allocating
> LruBlockCache with maximum size 199.7m
> 2009-11-10 22:39:12,587 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> -ROOT-,,0
> 2009-11-10 22:39:12,595 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN: -ROOT-,,0
> 2009-11-10 22:39:12,725 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> -ROOT-,,0/70236052 available; sequence id is 3
> 2009-11-10 22:39:18,700 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
> .META.,,1
> 2009-11-10 22:39:18,706 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
> MSG_REGION_OPEN: .META.,,1
> 2009-11-10 22:39:18,729 INFO
> org.apache.hadoop.hbase.regionserver.HRegion: region
> .META.,,1/1028785192 available; sequence id is 0
>
>
>
> Another thing I don't understand.  If I start and stop hbase, I get this
> error when I check the Master UI if I don't first delete the old HBase copy
> in HDFS
>
> HTTP ERROR: 500
>
> Trying to contact region server null for region , row '', but failed
> after 3 attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>
> RequestURI=/master.jsp
> Caused by:
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
> contact region server null for region , row '', but failed after 3
> attempts.
> Exceptions:
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
> org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
> trying to locate root region
>
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001)
>        at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55)
>        at 
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28)
>        at 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:432)
>        at 
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127)
>        at 
> org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125)
>        at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>        at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
>        at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>        at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>        at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.handle(Server.java:324)
>        at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>        at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>        at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>        at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>
>
>
> On Mon, Nov 9, 2009 at 11:47 PM, Tatsuya Kawano 
> <[email protected]>wrote:
>
>> Hello,
>>
>> It looks like the master and the region servers are cannot locate each
>> other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster
>> members, so maybe your zk has wrong information.
>>
>> Can you type zk_dump from hbase shell and let us the result?
>>
>> If the cluster is properly configured, you'll get something like this:
>> =====================================
>> hbase(main):007:0> zk_dump
>>
>> HBase tree in ZooKeeper is rooted at /hbase
>>  Cluster up? true
>>  In safe mode? false
>>  Master address: 172.16.80.26:60000
>>  Region server holding ROOT: 172.16.80.27:60020
>>  Region servers:
>>   - 172.16.80.27:60020
>>   - 172.16.80.29:60020
>>   - 172.16.80.28:60020
>> =====================================
>>
>>
>> > one of my co-workers apparently can log into his box and submit jobs, but
>> > me or anyone else is still unable to log in.
>>
>> Maybe you're a bit confused; your co-worker seems to be able to use
>> Hadoop Map/Reduce, not HBase.
>>
>>
>> > Does Hbase allow concurrent connections?
>>
>> Yes.
>>
>>
>> >> I think it also says the master is on port 60000
>> >> when the install directions say its supposed to be 60010?
>>
>> Port 60000 is correct. The master uses port 60000 to accept connection
>> from hbase shell and region servers. Port 60010 is for the web-based
>> HBase console.
>>
>>
>> > We tried applying this fix (to explicitly set the master):
>> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>>
>> No, this is an old way to configure a cluster. You shouldn't use this
>> with HBase 0.20.x
>>
>>
>> Thanks,
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>>
>>
>> On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates
>> <[email protected]> wrote:
>> > Another interesting data point.  We tried applying this fix (to
>> explicitly
>> > set the master):
>> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>> >
>> > But when I log in to the master node, it takes really long to submit a
>> query
>> > and I get this in response:
>> > hbase(main):001:0> list
>> > NativeException:
>> org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > Trying to contact region server null for region , row '', but failed
>> after 5
>> > attempts.
>> > Exceptions:
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> > to locate root region
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> > to locate root region
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> > to locate root region
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> > to locate root region
>> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out
>> trying
>> > to locate root region
>> >
>> > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>> > `getRegionServerWithRetries'
>> >  from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan'
>> > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan'
>> >  from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>> > `listTables'
>> > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables'
>> >  from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
>> > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
>> >  from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
>> > from java/lang/reflect/Method.java:597:in `invoke'
>> >  from org/jruby/javasupport/JavaMethod.java:298:in
>> > `invokeWithExceptionHandling'
>> > from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>> >  from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call'
>> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>> >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> > from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>> >  from org/jruby/ast/ForNode.java:104:in `interpret'
>> > ... 116 levels...
>> > from
>> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in
>> > `call'
>> >  from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in `call'
>> > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in `call'
>> >  from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in `call'
>> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>> `cacheAndCall'
>> >  from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call'
>> > from
>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in
>> > `__file__'
>> >  from
>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in
>> > `load'
>> > from org/jruby/Ruby.java:577:in `runScript'
>> >  from org/jruby/Ruby.java:480:in `runNormally'
>> > from org/jruby/Ruby.java:354:in `runFromMain'
>> >  from org/jruby/Main.java:229:in `run'
>> > from org/jruby/Main.java:110:in `run'
>> >  from org/jruby/Main.java:94:in `main'
>> > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list'
>> >  from (hbase):2hbase(main):002:0>
>> >
>> >
>> > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates <
>> > [email protected]> wrote:
>> >
>> >> thanks for your response Sujee.  These boxes are all on an internal DNS
>> and
>> >> they all resolve.
>> >>
>> >> one of my co-workers apparently can log into his box and submit jobs,
>> but
>> >> me or anyone else is still unable to log in.  Does Hbase allow
>> concurrent
>> >> connections?  In Hive I remember having to configure the metastore to be
>> in
>> >> server mode if multiple people were using it.
>> >>
>> >>
>> >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]> wrote:
>> >>
>> >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh
>> >>> >
>> >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of
>> known
>> >>> > hosts.
>> >>>
>> >>>
>> >>> is your SSH setup correctly?  From master, you need to be able to
>> >>> login to all slaves/regionservers without password
>> >>>
>> >>> And I see you are using short hostnames (crunch2, crunch3), do they
>> >>> all resolve correctly?  or you need to update /etc/hosts to resolve
>> >>> these to an IP address on all machines.
>> >>>
>> >>> regards
>> >>> Sujee Maniyam
>> >>> --
>> >>> http://sujee.net
>> >>>
>> >>
>> >>
>> >
>>
>

Re: HBase 0.20.1 Distributed Install Problems

Reply via email to