Thanks everyone for your help. We discovered a couple things: 1) Our Master Node was not in the ZK quorum. 2) Our hosts file was such that the regionservers were pinging against themselves, so we removed this line from our hosts file and made it so they had to go to the DNS to resolve their identity. This is still a little unclear to me as one of my co-workers fixed this issue.
We had some other problems, probably do to us messing with the configuration files so many times. So I removed Hbase from all the boxes. Then I followed these instructions http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_descriptionas stack had suggested. I then scp'd everything over to the other boxes...so ssh was working without password. The UI works. I was able to run "list" and "create" at the command shell. One weird thing though is this is my output from zk_dump: HBase tree in ZooKeeper is rooted at /hbase Cluster up? true In safe mode? false Master address: 172.16.1.46:60000 Region server holding ROOT: 172.16.1.46:60020 Region servers: - 172.16.1.46:60020 Which says I only have 1 region server. When I check the master UI it says there are 5 servers in the quorum--but only 1 regionserver. All the regionservers are supposed to be on post 2181 like in the Wiki---if I change it to 2222 as someone had mentioned---nothing works. I also have the same regionservers file in the conf directories that have 5 servers. When I check regionserver UI log on 60030 it says this: 2009-11-10 22:37:31,683 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2009-11-10 22:37:31,708 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: None, path: null 2009-11-10 22:37:31,860 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 172.16.1.46:60000 that we are up 2009-11-10 22:38:03,070 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Master passed us address to use. Was=172.16.1.46:60020, Now=172.16.1.46 2009-11-10 22:38:03,505 INFO org.apache.hadoop.hbase.regionserver.HLog: HLog configuration: blocksize=67108864, rollsize=63753420, enabled=true, flushlogentries=100, optionallogflushinternal=10000ms 2009-11-10 22:38:03,727 INFO org.apache.hadoop.hbase.regionserver.HLog: New hlog /hbase/.logs/chanel2.local,60020,1257910682720/hlog.dat.1257910683505 2009-11-10 22:38:03,759 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=RegionServer, sessionId=regionserver/172.16.1.46:60020 2009-11-10 22:38:03,769 INFO org.apache.hadoop.hbase.regionserver.metrics.RegionServerMetrics: Initialized 2009-11-10 22:38:04,143 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60030 2009-11-10 22:38:04,144 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 60030 webServer.getConnectors()[0].getLocalPort() returned 60030 2009-11-10 22:38:04,145 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 60030 2009-11-10 22:39:12,514 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server Responder: starting 2009-11-10 22:39:12,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server listener on 60020: starting 2009-11-10 22:39:12,517 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60020: starting 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020: starting 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60020: starting 2009-11-10 22:39:12,518 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020: starting 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60020: starting 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60020: starting 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020: starting 2009-11-10 22:39:12,519 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020: starting 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60020: starting 2009-11-10 22:39:12,520 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60020: starting 2009-11-10 22:39:12,520 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer started at: 172.16.1.46:60020 2009-11-10 22:39:12,532 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Allocating LruBlockCache with maximum size 199.7m 2009-11-10 22:39:12,587 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: -ROOT-,,0 2009-11-10 22:39:12,595 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: -ROOT-,,0 2009-11-10 22:39:12,725 INFO org.apache.hadoop.hbase.regionserver.HRegion: region -ROOT-,,0/70236052 available; sequence id is 3 2009-11-10 22:39:18,700 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: .META.,,1 2009-11-10 22:39:18,706 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN: .META.,,1 2009-11-10 22:39:18,729 INFO org.apache.hadoop.hbase.regionserver.HRegion: region .META.,,1/1028785192 available; sequence id is 0 Another thing I don't understand. If I start and stop hbase, I get this error when I check the Master UI if I don't first delete the old HBase copy in HDFS HTTP ERROR: 500 Trying to contact region server null for region , row '', but failed after 3 attempts. Exceptions: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region RequestURI=/master.jsp Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '', but failed after 3 attempts. Exceptions: org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out trying to locate root region at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1001) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:55) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:28) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.listTables(HConnectionManager.java:432) at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:127) at org.apache.hadoop.hbase.generated.master.master_jsp._jspService(master_jsp.java:125) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) On Mon, Nov 9, 2009 at 11:47 PM, Tatsuya Kawano <[email protected]>wrote: > Hello, > > It looks like the master and the region servers are cannot locate each > other. HBase 0.20.x uses ZooKeeper (zk) to locate other cluster > members, so maybe your zk has wrong information. > > Can you type zk_dump from hbase shell and let us the result? > > If the cluster is properly configured, you'll get something like this: > ===================================== > hbase(main):007:0> zk_dump > > HBase tree in ZooKeeper is rooted at /hbase > Cluster up? true > In safe mode? false > Master address: 172.16.80.26:60000 > Region server holding ROOT: 172.16.80.27:60020 > Region servers: > - 172.16.80.27:60020 > - 172.16.80.29:60020 > - 172.16.80.28:60020 > ===================================== > > > > one of my co-workers apparently can log into his box and submit jobs, but > > me or anyone else is still unable to log in. > > Maybe you're a bit confused; your co-worker seems to be able to use > Hadoop Map/Reduce, not HBase. > > > > Does Hbase allow concurrent connections? > > Yes. > > > >> I think it also says the master is on port 60000 > >> when the install directions say its supposed to be 60010? > > Port 60000 is correct. The master uses port 60000 to accept connection > from hbase shell and region servers. Port 60010 is for the web-based > HBase console. > > > > We tried applying this fix (to explicitly set the master): > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html > > No, this is an old way to configure a cluster. You shouldn't use this > with HBase 0.20.x > > > Thanks, > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > > > On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates > <[email protected]> wrote: > > Another interesting data point. We tried applying this fix (to > explicitly > > set the master): > > http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html > > > > But when I log in to the master node, it takes really long to submit a > query > > and I get this in response: > > hbase(main):001:0> list > > NativeException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: > > Trying to contact region server null for region , row '', but failed > after 5 > > attempts. > > Exceptions: > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > org.apache.hadoop.hbase.client.NoServerForRegionException: Timed out > trying > > to locate root region > > > > from org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in > > `getRegionServerWithRetries' > > from org/apache/hadoop/hbase/client/MetaScanner.java:55:in `metaScan' > > from org/apache/hadoop/hbase/client/MetaScanner.java:28:in `metaScan' > > from org/apache/hadoop/hbase/client/HConnectionManager.java:432:in > > `listTables' > > from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in `listTables' > > from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0' > > from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke' > > from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke' > > from java/lang/reflect/Method.java:597:in `invoke' > > from org/jruby/javasupport/JavaMethod.java:298:in > > `invokeWithExceptionHandling' > > from org/jruby/javasupport/JavaMethod.java:259:in `invoke' > > from org/jruby/java/invokers/InstanceMethodInvoker.java:36:in `call' > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in > `cacheAndCall' > > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' > > from org/jruby/ast/CallNoArgNode.java:61:in `interpret' > > from org/jruby/ast/ForNode.java:104:in `interpret' > > ... 116 levels... > > from > > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in > > `call' > > from org/jruby/internal/runtime/methods/DynamicMethod.java:226:in `call' > > from org/jruby/internal/runtime/methods/CompiledMethod.java:211:in `call' > > from org/jruby/internal/runtime/methods/CompiledMethod.java:71:in `call' > > from org/jruby/runtime/callsite/CachingCallSite.java:253:in > `cacheAndCall' > > from org/jruby/runtime/callsite/CachingCallSite.java:72:in `call' > > from > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in > > `__file__' > > from > opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in > > `load' > > from org/jruby/Ruby.java:577:in `runScript' > > from org/jruby/Ruby.java:480:in `runNormally' > > from org/jruby/Ruby.java:354:in `runFromMain' > > from org/jruby/Main.java:229:in `run' > > from org/jruby/Main.java:110:in `run' > > from org/jruby/Main.java:94:in `main' > > from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in `list' > > from (hbase):2hbase(main):002:0> > > > > > > On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates < > > [email protected]> wrote: > > > >> thanks for your response Sujee. These boxes are all on an internal DNS > and > >> they all resolve. > >> > >> one of my co-workers apparently can log into his box and submit jobs, > but > >> me or anyone else is still unable to log in. Does Hbase allow > concurrent > >> connections? In Hive I remember having to configure the metastore to be > in > >> server mode if multiple people were using it. > >> > >> > >> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <[email protected]> wrote: > >> > >>> > [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh > >>> > > >>> > crunch2: Warning: Permanently added 'crunch2' (RSA) to the list of > known > >>> > hosts. > >>> > >>> > >>> is your SSH setup correctly? From master, you need to be able to > >>> login to all slaves/regionservers without password > >>> > >>> And I see you are using short hostnames (crunch2, crunch3), do they > >>> all resolve correctly? or you need to update /etc/hosts to resolve > >>> these to an IP address on all machines. > >>> > >>> regards > >>> Sujee Maniyam > >>> -- > >>> http://sujee.net > >>> > >> > >> > > >
