Chris, One very important thing to understand is that the region servers are one thing and the quorum members another.
The quorum members are part of Zookeeper to provide a highly available distributed management system that are typically on a number of 3 or 5 nodes. HBase uses it to be highly available. They are listed in hbase.zookeeper.quorum in the file conf/hbase-site.xml and they listen on 2181. The region servers are the HBase "worker" nodes, they do the real work. You put them on as many machine you have (except 1 for the master). They are listed in conf/regionservers, one per line. They listen on 60020. Finally, the talk that Stack and Jon gave at apachecon is helpful in understanding issues encountered by new users http://su.pr/28fwSE Hope this helps, J-D On Tue, Nov 10, 2009 at 7:47 PM, Chris Bates <[email protected]> wrote: > Thanks everyone for your help. We discovered a couple things: > > 1) Our Master Node was not in the ZK quorum. > 2) Our hosts file was such that the regionservers were pinging against > themselves, so we removed this line from our hosts file and made it so they > had to go to the DNS to resolve their identity. This is still a little > unclear to me as one of my co-workers fixed this issue. > > We had some other problems, probably do to us messing with the configuration > files so many times. So I removed Hbase from all the boxes. Then I > followed these instructions > http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_descriptionas > stack had suggested. I then scp'd everything over to the other > boxes...so ssh was working without password. > > The UI works. I was able to run "list" and "create" at the command shell. > One weird thing though is this is my output from zk_dump: > HBase tree in ZooKeeper is rooted at /hbase > Cluster up? true > In safe mode? false > Master address: 172.16.1.46:60000 > Region server holding ROOT: 172.16.1.46:60020 > Region servers: > - 172.16.1.46:60020 > > Which says I only have 1 region server. When I check the master UI it says > there are 5 servers in the quorum--but only 1 regionserver. All the > regionservers are supposed to be on post 2181 like in the Wiki---if I change > it to 2222 as someone had mentioned---nothing works. I also have the same > regionservers file in the conf directories that have 5 servers. When I > check regionserver UI log on 60030 it says this: > > 2009-11-10 22:37:31,683 INFO org.apache.zookeeper.ClientCnxn: Server > connection successful > 2009-11-10 22:37:31,708 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper > event, state: SyncConnected, type: None, path: null > 2009-11-10 22:37:31,860 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at > 172.16.1.46:60000 that we are up > 2009-11-10 22:38:03,070 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us > address to use. Was=172.16.1.46:60020, Now=172.16.1.46 > 2009-11-10 22:38:03,505 INFO > org.apache.hadoop.hbase.regionserver.HLog: HLog configuration: > blocksize=67108864, rollsize=63753420, enabled=true, > flushlogentries=100, optionallogflushinternal=10000ms > 2009-11-10 22:38:03,727 INFO > org.apache.hadoop.hbase.regionserver.HLog: New hlog > /hbase/.logs/chanel2.local,60020,1257910682720/hlog.dat.1257910683505 > 2009-11-10 22:38:03,759 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=RegionServer, > sessionId=regionserver/172.16.1.46:60020 > 2009-11-10 22:38:03,769 INFO > org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics: > Initialized > 2009-11-10 22:38:04,143 INFO org.apache.hadoop.http.HttpServer: Port > returned by webServer.getConnectors()[0].getLocalPort() before open() > is -1. Opening the listener on 60030 > 2009-11-10 22:38:04,144 INFO org.apache.hadoop.http.HttpServer: > listener.getLocalPort() returned 60030 > webServer.getConnectors()[0].getLocalPort() returned 60030 > 2009-11-10 22:38:04,145 INFO org.apache.hadoop.http.HttpServer: Jetty > bound to port 60030 > 2009-11-10 22:39:12,514 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server Responder: starting > 2009-11-10 22:39:12,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server listener on 60020: starting > 2009-11-10 22:39:12,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 0 on 60020: starting > 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 1 on 60020: starting > 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 2 on 60020: starting > 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 3 on 60020: starting > 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 4 on 60020: starting > 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 5 on 60020: starting > 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 6 on 60020: starting > 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 7 on 60020: starting > 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 8 on 60020: starting > 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 9 on 60020: starting > 2009-11-10 22:39:12,520 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer > started at: 172.16.1.46:60020 > 2009-11-10 22:39:12,532 INFO > org.apache.hadoop.hbase.regionserver.StoreFile: Allocating > LruBlockCache with maximum size 199.7m > 2009-11-10 22:39:12,587 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > -ROOT-,,0 > 2009-11-10 22:39:12,595 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_OPEN: -ROOT-,,0 > 2009-11-10 22:39:12,725 INFO > org.apache.hadoop.hbase.regionserver.HRegion: region > -ROOT-,,0/70236052 available; sequence id is 3 > 2009-11-10 22:39:18,700 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > .META.,,1 > 2009-11-10 22:39:18,706 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > MSG_REGION_OPEN: .META.,,1 > 2009-11-10 22:39:18,729 INFO > org.apache.hadoop.hbase.regionserver.HRegion: region > .META.,,1/1028785192 available; sequence id is 0 > > > > Another thing I don't understand. If I start and stop hbase, I get this > error when I check the Master UI if I don't first delete the old HBase copy > in HDFS > > HTTP ERROR: 500 > > Trying to contact region server null for region , row '', but failed > after 3 attempts. > Exceptions: > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > > RequestURI=/master.jsp > Caused by: > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > contact region server null for region , row '', but failed after 3 > attempts. > Exceptions: > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying to locate root region > > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:432) > at > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127) > at > org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > > > > On Mon, Nov 9, 2009 at 11:47 PM, Tatsuya Kawano > <[email protected]>wrote: > >> Hello, >> >> It looks like the master and the region servers are cannot locate each >> other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster >> members, so maybe your zk has wrong information. >> >> Can you type zk_dump from hbase shell and let us the result? >> >> If the cluster is properly configured, you'll get something like this: >> ===================================== >> hbase(main):007:0> zk_dump >> >> HBase tree in ZooKeeper is rooted at /hbase >> Cluster up? true >> In safe mode? false >> Master address: 172.16.80.26:60000 >> Region server holding ROOT: 172.16.80.27:60020 >> Region servers: >> - 172.16.80.27:60020 >> - 172.16.80.29:60020 >> - 172.16.80.28:60020 >> ===================================== >> >> >> > one of my co-workers apparently can log into his box and submit jobs, but >> > me or anyone else is still unable to log in. >> >> Maybe you're a bit confused; your co-worker seems to be able to use >> Hadoop Map/Reduce, not HBase. >> >> >> > Does Hbase allow concurrent connections? >> >> Yes. >> >> >> >> I think it also says the master is on port 60000 >> >> when the install directions say its supposed to be 60010? >> >> Port 60000 is correct. The master uses port 60000 to accept connection >> from hbase shell and region servers. Port 60010 is for the web-based >> HBase console. >> >> >> > We tried applying this fix (to explicitly set the master): >> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >> >> No, this is an old way to configure a cluster. You shouldn't use this >> with HBase 0.20.x >> >> >> Thanks, >> >> -- >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> >> >> On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates >> <[email protected]> wrote: >> > Another interesting data point. We tried applying this fix (to >> explicitly >> > set the master): >> > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >> > >> > But when I log in to the master node, it takes really long to submit a >> query >> > and I get this in response: >> > hbase(main):001:0> list >> > NativeException: >> org.apache.hadoop.hbase.client.RetriesExhaustedException: >> > Trying to contact region server null for region , row '', but failed >> after 5 >> > attempts. >> > Exceptions: >> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> trying >> > to locate root region >> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> trying >> > to locate root region >> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> trying >> > to locate root region >> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> trying >> > to locate root region >> > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out >> trying >> > to locate root region >> > >> > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in >> > `getRegionServerWithRetries' >> > from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' >> > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' >> > from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in >> > `listTables' >> > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' >> > from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' >> > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' >> > from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' >> > from java/lang/reflect/Method.java:597:in `invoke' >> > from org/jruby/javasupport/JavaMethod.java:298:in >> > `invokeWithExceptionHandling' >> > from org/jruby/javasupport/JavaMethod.java:259:in `invoke' >> > from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' >> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> `cacheAndCall' >> > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> > from org/jruby/ast/CallNoArgNode.java:61:in `interpret' >> > from org/jruby/ast/ForNode.java:104:in `interpret' >> > ... 116 levels... >> > from >> > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in >> > `call' >> > from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in `call' >> > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in `call' >> > from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in `call' >> > from org/jruby/runtime/callsite/CachingCallSite.java:253:in >> `cacheAndCall' >> > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' >> > from >> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in >> > `__file__' >> > from >> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in >> > `load' >> > from org/jruby/Ruby.java:577:in `runScript' >> > from org/jruby/Ruby.java:480:in `runNormally' >> > from org/jruby/Ruby.java:354:in `runFromMain' >> > from org/jruby/Main.java:229:in `run' >> > from org/jruby/Main.java:110:in `run' >> > from org/jruby/Main.java:94:in `main' >> > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list' >> > from (hbase):2hbase(main):002:0> >> > >> > >> > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates < >> > [email protected]> wrote: >> > >> >> thanks for your response Sujee. These boxes are all on an internal DNS >> and >> >> they all resolve. >> >> >> >> one of my co-workers apparently can log into his box and submit jobs, >> but >> >> me or anyone else is still unable to log in. Does Hbase allow >> concurrent >> >> connections? In Hive I remember having to configure the metastore to be >> in >> >> server mode if multiple people were using it. >> >> >> >> >> >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]> wrote: >> >> >> >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh >> >>> > >> >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of >> known >> >>> > hosts. >> >>> >> >>> >> >>> is your SSH setup correctly? From master, you need to be able to >> >>> login to all slaves/regionservers without password >> >>> >> >>> And I see you are using short hostnames (crunch2, crunch3), do they >> >>> all resolve correctly? or you need to update /etc/hosts to resolve >> >>> these to an IP address on all machines. >> >>> >> >>> regards >> >>> Sujee Maniyam >> >>> -- >> >>> http://sujee.net >> >>> >> >> >> >> >> > >> >
