Hi Lars, By no logs I mean that when I ssh into any of the M2-M5 boxes and check the logs folder, there is only zookeeper logs, no RS logs (see below). The permissions are ok.
This is what I see when I run start-hbase.sh -- I can ssh into any of the boxes with no password just fine, it just gives me a weird first time host message...we get the same thing when we start up hadoop. had...@chanel2:/opt/hadoop/hbase-0.20.1$ bin/start-hbase.sh crunch2: Warning: Permanently added '[crunch2]:2200,[172.16.1.95]:2200' (RSA) to the list of known hosts. chanel: Warning: Permanently added '[chanel]:2200,[172.16.1.45]:2200' (RSA) to the list of known hosts. chanel2: Warning: Permanently added '[chanel2]:2200,[172.16.1.46]:2200' (RSA) to the list of known hosts. chris: Warning: Permanently added '[chris]:2200,[172.16.1.83]:2200' (RSA) to the list of known hosts. crunch3: Warning: Permanently added '[crunch3]:2200,[172.16.1.96]:2200' (RSA) to the list of known hosts. chanel: starting zookeeper, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chanel.out chanel2: starting zookeeper, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chanel2.out chris: starting zookeeper, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chris.out crunch2: starting zookeeper, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-crunch2.out crunch3: starting zookeeper, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-crunch3.out starting master, logging to /opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-master-chanel2.out crunch2: Warning: Permanently added '[crunch2]:2200,[172.16.1.95]:2200' (RSA) to the list of known hosts. crunch3: Warning: Permanently added '[crunch3]:2200,[172.16.1.96]:2200' (RSA) to the list of known hosts. chanel: Warning: Permanently added '[chanel]:2200,[172.16.1.45]:2200' (RSA) to the list of known hosts. chris: Warning: Permanently added '[chris]:2200,[172.16.1.83]:2200' (RSA) to the list of known hosts. crunch2: regionserver running as process 6950. Stop it first. chanel: regionserver running as process 22200. Stop it first. crunch3: regionserver running as process 28962. Stop it first. chris: regionserver running as process 28719. Stop it first. Here is the jstack from one of the boxes: had...@chanel:/opt/hadoop/hbase-0.20.1$ jps 23777 TaskTracker 30449 Jps 23694 DataNode 26747 Main 22200 HRegionServer 30174 HQuorumPeer had...@chanel:/opt/hadoop/hbase-0.20.1$ jstack 22200 2009-11-11 03:43:56 Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode): "Attach Listener" daemon prio=10 tid=0x083f8000 nid=0x7709 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "main-EventThread" daemon prio=10 tid=0x6e586400 nid=0x56e3 waiting on condition [0x6e4ad000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x73865330> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) "main-SendThread" daemon prio=10 tid=0x6e572400 nid=0x56e2 waiting on condition [0x6e4fe000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:851) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:895) "Low Memory Detector" daemon prio=10 tid=0x0813ac00 nid=0x56dd runnable [0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread1" daemon prio=10 tid=0x08139000 nid=0x56dc waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x08136400 nid=0x56db waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x08134c00 nid=0x56da runnable [0x00000000] java.lang.Thread.State: RUNNABLE "Surrogate Locker Thread (CMS)" daemon prio=10 tid=0x08133400 nid=0x56d9 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=10 tid=0x0811f800 nid=0x56d8 in Object.wait() [0x6ec75000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x73860458> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x73860458> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x0811e400 nid=0x56d7 in Object.wait() [0x6ecc6000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x738657e0> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x738657e0> (a java.lang.ref.Reference$Lock) "main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition [0xb72f2000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74) at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51) at org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387) at org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315) at org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306) at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2472) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2540) "VM Thread" prio=10 tid=0x0811a400 nid=0x56d6 runnable "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x0805e400 nid=0x56d3 runnable "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x0805fc00 nid=0x56d4 runnable "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x080cd800 nid=0x56d5 runnable "VM Periodic Task Thread" prio=10 tid=0x0813cc00 nid=0x56de waiting on condition JNI global references: 691 had...@chanel:/opt/hadoop/hbase-0.20.1$ ls -l total 3628 drwxr-xr-x 2 hadoop hadoop 4096 2009-11-10 21:41 bin -rw-r--r-- 1 hadoop hadoop 21416 2009-11-10 21:41 build.xml -rw-r--r-- 1 hadoop hadoop 115584 2009-11-10 21:41 CHANGES.txt drwxr-xr-x 2 hadoop hadoop 4096 2009-11-11 02:00 conf drwxr-xr-x 4 hadoop hadoop 4096 2009-11-10 21:41 contrib drwxr-xr-x 5 hadoop hadoop 4096 2009-11-10 21:41 docs -rw-r--r-- 1 hadoop hadoop 1544829 2009-11-10 21:41 hbase-0.20.1.jar -rw-r--r-- 1 hadoop hadoop 1954331 2009-11-10 21:41 hbase-0.20.1-test.jar drwxr-xr-x 4 hadoop hadoop 4096 2009-11-10 21:41 lib -rw-r--r-- 1 hadoop hadoop 11358 2009-11-10 21:41 LICENSE.txt drwxr-xr-x 2 hadoop hadoop 4096 2009-11-11 03:38 logs -rw-r--r-- 1 hadoop hadoop 1741 2009-11-10 21:41 NOTICE.txt -rw-r--r-- 1 hadoop hadoop 43 2009-11-10 21:41 README.txt drwxr-xr-x 8 hadoop hadoop 4096 2009-11-10 21:41 src drwxr-xr-x 6 hadoop hadoop 4096 2009-11-10 21:41 webapps had...@chanel:/opt/hadoop/hbase-0.20.1$ cd logs/ had...@chanel:/opt/hadoop/hbase-0.20.1/logs$ ll total 72 -rw-r--r-- 1 hadoop hadoop 66759 2009-11-11 03:38 hbase-hadoop-zookeeper-chanel.log -rw-r--r-- 1 hadoop hadoop 0 2009-11-11 03:38 hbase-hadoop-zookeeper-chanel.out -rw-r--r-- 1 hadoop hadoop 0 2009-11-11 03:00 hbase-hadoop-zookeeper-chanel.out.1 -rw-r--r-- 1 hadoop hadoop 0 2009-11-11 02:56 hbase-hadoop-zookeeper-chanel.out.2 -rw-r--r-- 1 hadoop hadoop 0 2009-11-11 02:36 hbase-hadoop-zookeeper-chanel.out.3 -rw-r--r-- 1 hadoop hadoop 0 2009-11-11 02:18 hbase-hadoop-zookeeper-chanel.out.4 On Wed, Nov 11, 2009 at 3:15 AM, Lars George <[email protected]> wrote: > Chris, > > What do you mean there are no region server logs? On the M2-M5 you have no > logs? Is the Java process for the RS running? If so, could you jstck it to > see where it hangs? > > Maybe you have an access/owner issue with the log dirs on the RS machines? > > The master log looks OK. > > Lars > > Chris Bates schrieb: > >> Again, I really appreciate the help. I removed the master from the region >> server list and made sure the rest of the machines had an updated list. >> No >> region servers still: >> hbase(main):001:0> zk_dump >> >> HBase tree in ZooKeeper is rooted at /hbase >> Cluster up? true >> In safe mode? true >> Master address: 172.16.1.46:60000 >> Region server holding ROOT: 172.16.1.46:60020 >> Region servers: >> >> hbase(main):002:0> status 'simple' >> 0 live servers >> 0 dead servers >> >> I checked the /etc/hosts file on all machines and they all have 127.0.0.1 >> localhost.localdomain localhost and then their other mappings for other >> domains, with the box name mapping was removed. >> >> There are no regionserver logs. But the master log is this: >> 2009-11-11 03:02:34,798 INFO org.apache.hadoop.hbase.master.RegionManager: >> -ROOT- region unset (but not set to be reassigned) >> 2009-11-11 03:02:34,799 INFO org.apache.hadoop.hbase.master.RegionManager: >> ROOT inserted into regionsInTransition >> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server chanel2/172.16.1.46:2181 >> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Priming >> connection to java.nio.channels.SocketChannel[connected local=/ >> 172.16.1.46:53335 remote=chanel2/172.16.1.46:2181] >> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Server >> connection successful >> 2009-11-11 03:02:35,179 INFO org.apache.hadoop.hbase.master.HMaster: >> HMaster >> initialized on 172.16.1.46:60000 >> 2009-11-11 03:02:35,197 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >> Initializing JVM Metrics with processName=Master, sessionId=HMaster >> 2009-11-11 03:02:35,198 INFO >> org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized >> 2009-11-11 03:02:35,373 INFO org.apache.hadoop.http.HttpServer: Port >> returned by webServer.getConnectors()[0].getLocalPort() before open() is >> -1. >> Opening the listener on 60010 >> 2009-11-11 03:02:35,374 INFO org.apache.hadoop.http.HttpServer: >> listener.getLocalPort() returned 60010 >> webServer.getConnectors()[0].getLocalPort() returned 60010 >> 2009-11-11 03:02:35,374 INFO org.apache.hadoop.http.HttpServer: Jetty >> bound >> to port 60010 >> 2009-11-11 03:02:52,692 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> Responder: starting >> 2009-11-11 03:02:52,693 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> listener on 60000: starting >> 2009-11-11 03:02:52,695 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 0 on 60000: starting >> 2009-11-11 03:02:52,695 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 1 on 60000: starting >> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 2 on 60000: starting >> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 3 on 60000: starting >> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 4 on 60000: starting >> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 5 on 60000: starting >> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 6 on 60000: starting >> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 7 on 60000: starting >> 2009-11-11 03:02:52,698 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 8 on 60000: starting >> 2009-11-11 03:02:52,698 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server >> handler 9 on 60000: starting >> 2009-11-11 03:03:34,719 INFO org.apache.hadoop.hbase.master.ServerManager: >> 0 >> region servers, 0 dead, average load NaN >> 2009-11-11 03:03:35,200 INFO org.apache.hadoop.hbase.master.BaseScanner: >> All >> 0 .META. region(s) scanned >> >> >> >> On Wed, Nov 11, 2009 at 2:39 AM, Jeff Zhang <[email protected]> wrote: >> >> >> >>> Hi Jean, >>> >>> Thank you, after I remove the mapping from sha-cs-03 stuff to localhost >>> it >>> works. >>> >>> But I installed hadoop successfully on these machines before, is hbase >>> different from hadoop about the ip mapping ? >>> >>> >>> Jeff Zhang >>> >>> >>> >>> On Wed, Nov 11, 2009 at 1:29 PM, Jean-Daniel Cryans <[email protected] >>> >>> >>>> wrote: >>>> Check your OS networking configuration, make sure stuff don't >>>> resolves >>>> to localhost or 127.0.0.1 or 127.0.1.1 >>>> >>>> Also you said you can't run the list, what does it do then? >>>> >>>> J-D >>>> >>>> On Tue, Nov 10, 2009 at 9:23 PM, Jeff Zhang <[email protected]> wrote: >>>> >>>> >>>>> *I configure the regionservers in the file regsionservers as >>>>> >>>>> >>>> following:* >>> >>> >>>> sha-cs-01 >>>>> sha-cs-02 >>>>> sha-cs-03 >>>>> sha-cs-05 >>>>> sha-cs-06 >>>>> >>>>> *And also I configure the zookeeper in file hbase-site.xml as >>>>> >>>>> >>>> following:* >>> >>> >>>> <configuration> >>>>> <property> >>>>> <name>hbase.cluster.distributed</name> >>>>> <value>true</value> >>>>> <description>The mode the cluster will be in. Possible values are >>>>> false: standalone and pseudo-distributed setups with managed >>>>> >>>>> >>>> Zookeeper >>>> >>>> >>>>> true: fully-distributed with unmanaged Zookeeper Quorum (see >>>>> hbase-env.sh) >>>>> </description> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.property.clientPort</name> >>>>> <value>2222</value> >>>>> <description>Property from ZooKeeper's config zoo.cfg. >>>>> The port at which the clients will connect. >>>>> </description> >>>>> </property> >>>>> <property> >>>>> <name>hbase.zookeeper.quorum</name> >>>>> <value>*sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-04,sha-cs-06*</value> >>>>> <description>Comma separated list of servers in the ZooKeeper >>>>> >>>>> >>>> Quorum. >>>> >>>> >>>>> For example, "host1.mydomain.com,host2.mydomain.com, >>>>> >>>>> >>>> host3.mydomain.com >>>> >>>> >>>>> ". >>>>> By default this is set to localhost for local and >>>>> >>>>> >>>> pseudo-distributed >>> >>> >>>> modes >>>>> of operation. For a fully-distributed setup, this should be set to >>>>> >>>>> >>>> a >>> >>> >>>> full >>>>> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in >>>>> hbase-env.sh >>>>> this is the list of servers which we will start/stop ZooKeeper on. >>>>> </description> >>>>> </property> >>>>> <property> >>>>> <name>hbase.rootdir</name> >>>>> <value>hdfs://sha-cs-04:9000/hbase</value> >>>>> <description>The directory shared by region servers. >>>>> </description> >>>>> </property> >>>>> >>>>> </configuration> >>>>> >>>>> >>>>> I still do not understand what's wrong with my configuration ? >>>>> >>>>> >>>>> Jeff Zhang >>>>> >>>>> >>>>> >>>>> On Wed, Nov 11, 2009 at 12:56 PM, Jean-Daniel Cryans < >>>>> >>>>> >>>> [email protected]>wrote: >>>> >>>> >>>>> Please read my answer to Chris (wrote about 10-15 minutes ago), you >>>>>> also seem to confuse regionservers and zookeeper quorum members. >>>>>> >>>>>> In this case it also seems some region servers registered themselves >>>>>> as localhost and then with their good address the master probably gave >>>>>> them. Please check your OS network configurations and make sure the >>>>>> hostname points at the right place. >>>>>> >>>>>> J-D >>>>>> >>>>>> On Tue, Nov 10, 2009 at 8:47 PM, Jeff Zhang <[email protected]> wrote: >>>>>> >>>>>> >>>>>>> Hi Jean, >>>>>>> >>>>>>> I try the hbase 0.20.2, I look the logs, it seems the master the >>>>>>> >>>>>>> >>>>>> regions >>>> >>>> >>>>> works. >>>>>>> >>>>>>> But I can not run list command on hbase shell. When I invoke command >>>>>>> >>>>>>> >>>>>> status >>>>>> >>>>>> >>>>>>> 'simple' on hbase shell. It shows the following message: >>>>>>> 09/11/11 12:42:55 DEBUG client.HConnectionManager$ClientZKWatcher: >>>>>>> >>>>>>> >>>>>> Got >>> >>> >>>> ZooKeeper event, state: SyncConnected, type: None, path: null >>>>>>> 09/11/11 12:42:55 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode >>>>>>> >>>>>>> >>>>>> /hbase/master >>>>>> >>>>>> >>>>>>> got 10.148.224.13:60000 >>>>>>> 8 servers, 0 dead, 0.1250 average load >>>>>>> hbase(main):002:0> status 'simple' >>>>>>> 8 live servers >>>>>>> localhost:60020 1257914319445 >>>>>>> requests=0, regions=0, usedHeap=0, maxHeap=0 >>>>>>> sha-cs-03:60020 1257914321331 >>>>>>> requests=0, regions=0, usedHeap=33, maxHeap=991 >>>>>>> localhost:60020 1257914320265 >>>>>>> requests=0, regions=0, usedHeap=0, maxHeap=0 >>>>>>> sha-cs-01:60020 1257914320551 >>>>>>> requests=0, regions=1, usedHeap=34, maxHeap=991 >>>>>>> sha-cs-05:60020 1257914322656 >>>>>>> requests=0, regions=0, usedHeap=33, maxHeap=991 >>>>>>> sha-cs-06:60020 1257914321467 >>>>>>> requests=0, regions=0, usedHeap=34, maxHeap=991 >>>>>>> localhost:60020 1257914320202 >>>>>>> requests=0, regions=0, usedHeap=0, maxHeap=0 >>>>>>> localhost:60020 1257914321532 >>>>>>> requests=0, regions=0, usedHeap=0, maxHeap=0 >>>>>>> >>>>>>> >>>>>>> It's weired that why here I have 3 localhost zookeeper, actually I >>>>>>> >>>>>>> >>>>>> set >>> >>> >>>> 5 >>>> >>>> >>>>> machines on hbase.zookeeper.quorum >>>>>>> >>>>>>> >>>>>>> >>>>>>> Jeff Zhang >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 11, 2009 at 9:47 AM, Jean-Daniel Cryans < >>>>>>> >>>>>>> >>>>>> [email protected] >>>> >>>> >>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> This particular problem is fixed in the current 0.20 branch and we >>>>>>>> just released a candidate for 0.20.2, you can get it here >>>>>>>> http://people.apache.org/~jdcryans/hbase-0.20.2-candidate-1/< >>>>>>>> >>>>>>>> >>>>>>> http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/> >>> >>> >>>> <http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/> >>>> >>>> >>>>> <http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/> >>>>>> >>>>>> >>>>>>> J-D >>>>>>>> >>>>>>>> On Tue, Nov 10, 2009 at 5:43 PM, Jeff Zhang <[email protected]> >>>>>>>> >>>>>>>> >>>>>>> wrote: >>>> >>>> >>>>> The following is the region server's log : >>>>>>>>> >>>>>>>>> >>>>>>>>> 2009-11-10 18:09:08,062 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 3 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 4 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 5 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 6 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 7 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 8 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,063 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer >>>>>>>>> >>>>>>>>> >>>>>>>> started >>>>>> >>>>>> >>>>>>> at: 10.148.224.11:60020 >>>>>>>>> 2009-11-10 18:09:08,064 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 9 on 60020: starting >>>>>>>>> 2009-11-10 18:09:08,070 INFO >>>>>>>>> >>>>>>>>> >>>>>>>> org.apache.hadoop.hbase.regionserver.StoreFile: >>>>>>>> >>>>>>>> >>>>>>>>> Allocating LruBlockCache with maximum size 198.3m >>>>>>>>> 2009-11-10 18:09:08,095 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>>>>>>>> >>>>>>>>> >>>>>>>> MSG_CALL_SERVER_STARTUP >>>>>>>> >>>>>>>> >>>>>>>>> 2009-11-10 18:09:08,229 INFO >>>>>>>>> >>>>>>>>> >>>>>>>> org.apache.hadoop.hbase.regionserver.HLog: >>>>>> >>>>>> >>>>>>> HLog >>>>>>>> >>>>>>>> >>>>>>>>> configuration: blocksize=67108864, rollsize=63753420, >>>>>>>>> >>>>>>>>> >>>>>>>> enabled=true, >>> >>> >>>> flushlogentries=100, optionallogflushinternal=10000ms >>>>>>>>> 2009-11-10 18:09:08,253 INFO >>>>>>>>> >>>>>>>>> >>>>>>>> org.apache.hadoop.hbase.regionserver.HLog: >>>>>> >>>>>> >>>>>>> New >>>>>>>> >>>>>>>> >>>>>>>>> hlog /hbase/.logs/10.148.224.11 >>>>>>>>> >>>>>>>>> >>>>>>>> ,60020,1257847748205/hlog.dat.1257847748229 >>>>>>>> >>>>>>>> >>>>>>>>> 2009-11-10 18:09:08,255 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling >>>>>>>>> >>>>>>>>> >>>>>>>> master >>> >>> >>>> at >>>> >>>> >>>>> 10.148.224.13:60000 that we are up >>>>>>>>> 2009-11-10 18:09:08,302 FATAL >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled >>>>>>>>> >>>>>>>>> >>>>>>>> exception. >>>>>> >>>>>> >>>>>>> Aborting... >>>>>>>>> java.lang.NullPointerException >>>>>>>>> at >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459) >>> >>> >>>> at java.lang.Thread.run(Thread.java:619) >>>>>>>>> 2009-11-10 18:09:08,304 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of >>>>>>>>> >>>>>>>>> >>>>>>>> metrics: >>>> >>>> >>>>> request=0.0, regions=0, stores=0, storefiles=0, >>>>>>>>> >>>>>>>>> >>>>>>>> storefileIndexSize=0, >>>> >>>> >>>>> memstoreSize=0, usedHeap=31, maxHeap=99 >>>>>>>>> 1, blockCacheSize=1707288, blockCacheFree=206264664, >>>>>>>>> >>>>>>>>> >>>>>>>> blockCacheCount=0, >>>>>> >>>>>> >>>>>>> blockCacheHitRatio=0 >>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> Stopping >>>>>> >>>>>> >>>>>>> server on 60020 >>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 0 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> Stopping >>>>>> >>>>>> >>>>>>> IPC >>>>>>>> >>>>>>>> >>>>>>>>> Server listener on 60020 >>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 1 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 2 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 3 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 4 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 5 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 6 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 7 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 8 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> IPC >>> >>> >>>> Server >>>>>>>> >>>>>>>> >>>>>>>>> handler 9 on 60020: exiting >>>>>>>>> 2009-11-10 18:09:08,306 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping >>>>>>>>> >>>>>>>>> >>>>>>>> infoServer >>>>>> >>>>>> >>>>>>> 2009-11-10 18:09:08,307 INFO org.apache.hadoop.ipc.HBaseServer: >>>>>>>>> >>>>>>>>> >>>>>>>> Stopping >>>>>> >>>>>> >>>>>>> IPC >>>>>>>> >>>>>>>> >>>>>>>>> Server Responder >>>>>>>>> 2009-11-10 18:09:08,412 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: >>>>>>>>> regionserver/127.0.0.1:60020.cacheFlusher exiting >>>>>>>>> 2009-11-10 18:09:08,412 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.LogFlusher: >>>>>>>>> regionserver/127.0.0.1:60020.logFlusher exiting >>>>>>>>> 2009-11-10 18:09:08,412 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: >>>>>>>>> regionserver/127.0.0.1:60020.compactor exiting >>>>>>>>> 2009-11-10 18:09:08,412 INFO >>>>>>>>> >>>>>>>>> >>>>>>>> org.apache.hadoop.hbase.regionserver.LogRoller: >>>>>>>> >>>>>>>> >>>>>>>>> LogRoller exiting. >>>>>>>>> 2009-11-10 18:09:08,413 INFO >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker: >>> >>> >>>> regionserver/127.0.0.1:60020.majorCompactionChecker exiting >>>>>>>>> 2009-11-10 18:09:08,427 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: On abort, >>>>>>>>> >>>>>>>>> >>>>>>>> closed >>>> >>>> >>>>> hlog >>>>>> >>>>>> >>>>>>> 2009-11-10 18:09:08,428 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting >>>>>>>>> >>>>>>>>> >>>>>>>> server >>> >>> >>>> at: >>>>>> >>>>>> >>>>>>> 10.148.224.11:60020 >>>>>>>>> 2009-11-10 18:09:17,489 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread >>>>>>>>> >>>>>>>>> >>>>>>>> exiting >>>>>> >>>>>> >>>>>>> 2009-11-10 18:09:17,489 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>>> >>>>>>>>> >>>>>>>> Closing >>>> >>>> >>>>> session: 0x324dcceb05c0003 >>>>>>>>> 2009-11-10 18:09:17,490 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>> >>>>>>>> Closing >>>> >>>> >>>>> ClientCnxn for session: 0x324dcceb05c0003 >>>>>>>>> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases: >>>>>>>>> regionserver/127.0.0.1:60020.leaseChecker closing leases >>>>>>>>> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases: >>>>>>>>> regionserver/127.0.0.1:60020.leaseChecker closed leases >>>>>>>>> 2009-11-10 18:09:17,500 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>> >>>>>>>> Exception >>>>>> >>>>>> >>>>>>> while closing send thread for session 0x324dcceb05c0003 : Read >>>>>>>>> >>>>>>>>> >>>>>>>> error >>>> >>>> >>>>> rc = >>>>>> >>>>>> >>>>>>> -1 >>>>>>>> >>>>>>>> >>>>>>>>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] >>>>>>>>> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>> >>>>>>>> Disconnecting >>>>>>>> >>>>>>>> >>>>>>>>> ClientCnxn for session: 0x324dcceb05c0003 >>>>>>>>> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ZooKeeper: >>>>>>>>> >>>>>>>>> >>>>>>>> Session: >>>> >>>> >>>>> 0x324dcceb05c0003 closed >>>>>>>>> 2009-11-10 18:09:17,605 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/ >>>>>>>>> 127.0.0.1:60020 exiting >>>>>>>>> 2009-11-10 18:09:17,605 INFO org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>> >>>>>>>> EventThread >>>>>> >>>>>> >>>>>>> shut down >>>>>>>>> 2009-11-10 18:09:17,606 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting >>>>>>>>> >>>>>>>>> >>>>>>>> shutdown >>>> >>>> >>>>> thread. >>>>>>>>> 2009-11-10 18:09:17,606 INFO >>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown >>>>>>>>> >>>>>>>>> >>>>>>>> thread >>> >>> >>>> complete >>>>>>>> >>>>>>>> >>>>>>>>> On Tue, Nov 10, 2009 at 10:55 PM, Andrew Purtell < >>>>>>>>> >>>>>>>>> >>>>>>>> [email protected] >>>> >>>> >>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> When you try to start the region servers, what do you see in the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> log? >>>> >>>> >>>>> If you don't change the client port >>>>>>>>>> >>>>>>>>>> >>>>>>>>> (hbase.zookeeper.property.clientPort), >>>>>>>> >>>>>>>> >>>>>>>>> does it work? >>>>>>>>>> >>>>>>>>>> - Andy >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ________________________________ >>>>>>>>>> From: Jeff Zhang <[email protected]> >>>>>>>>>> To: [email protected] >>>>>>>>>> Sent: Tue, November 10, 2009 2:40:28 PM >>>>>>>>>> Subject: Re: HBase 0.20.1 Distributed Install Problems >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I meet the same problem that I can not start the regionserver. >>>>>>>>>> >>>>>>>>>> When I invoke zk_dump >>>>>>>>>> >>>>>>>>>> it shows: >>>>>>>>>> >>>>>>>>>> HBase tree in ZooKeeper is rooted at /hbase >>>>>>>>>> Cluster up? true >>>>>>>>>> In safe mode? true >>>>>>>>>> Master address: 10.148.224.13:60000 >>>>>>>>>> Region server holding ROOT: null >>>>>>>>>> Region servers: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The following is my hbase-site.xml >>>>>>>>>> >>>>>>>>>> <configuration> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.cluster.distributed</name> >>>>>>>>>> <value>true</value> >>>>>>>>>> <description>The mode the cluster will be in. Possible values >>>>>>>>>> >>>>>>>>>> >>>>>>>>> are >>>> >>>> >>>>> false: standalone and pseudo-distributed setups with >>>>>>>>>> >>>>>>>>>> >>>>>>>>> managed >>> >>> >>>> Zookeeper >>>>>>>> >>>>>>>> >>>>>>>>> true: fully-distributed with unmanaged Zookeeper Quorum >>>>>>>>>> >>>>>>>>>> >>>>>>>>> (see >>> >>> >>>> hbase-env.sh) >>>>>>>>>> </description> >>>>>>>>>> </property> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.rootdir</name> >>>>>>>>>> <value>hdfs://sha-cs-04:9000/hbase</value> >>>>>>>>>> <description>The directory shared by region servers. >>>>>>>>>> </description> >>>>>>>>>> </property> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.zookeeper.property.clientPort</name> >>>>>>>>>> <value>2222</value> >>>>>>>>>> <description>Property from ZooKeeper's config zoo.cfg. >>>>>>>>>> The port at which the clients will connect. >>>>>>>>>> </description> >>>>>>>>>> </property> >>>>>>>>>> <property> >>>>>>>>>> <name>hbase.zookeeper.quorum</name> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> <value>sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-05,sha-cs-06</value> >>>> >>>> >>>>> <description>Comma separated list of servers in the >>>>>>>>>> >>>>>>>>>> >>>>>>>>> ZooKeeper >>> >>> >>>> Quorum. >>>>>>>> >>>>>>>> >>>>>>>>> For example, "host1.mydomain.com,host2.mydomain.com, >>>>>>>>>> host3.mydomain.com >>>>>>>>>> ". >>>>>>>>>> By default this is set to localhost for local and >>>>>>>>>> >>>>>>>>>> >>>>>>>>> pseudo-distributed >>>>>>>> >>>>>>>> >>>>>>>>> modes >>>>>>>>>> of operation. For a fully-distributed setup, this should be >>>>>>>>>> >>>>>>>>>> >>>>>>>>> set >>>> >>>> >>>>> to >>>>>> >>>>>> >>>>>>> a >>>>>>>> >>>>>>>> >>>>>>>>> full >>>>>>>>>> list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is >>>>>>>>>> >>>>>>>>>> >>>>>>>>> set >>> >>> >>>> in >>>> >>>> >>>>> hbase-env.sh >>>>>>>>>> this is the list of servers which we will start/stop >>>>>>>>>> >>>>>>>>>> >>>>>>>>> ZooKeeper >>>> >>>> >>>>> on. >>>>>> >>>>>> >>>>>>> </description> >>>>>>>>>> </property> >>>>>>>>>> >>>>>>>>>> </configuration> >>>>>>>>>> >>>>>>>>>> What's wrong with my configuration ? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you in advance. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Jeff Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Nov 10, 2009 at 12:47 PM, Tatsuya Kawano >>>>>>>>>> <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> It looks like the master and the region servers are cannot >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> locate >>>> >>>> >>>>> each >>>>>> >>>>>> >>>>>>> other. HBase 0.20.x uses ZooKeeper (zk) to locate other >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> cluster >>> >>> >>>> members, so maybe your zk has wrong information. >>>>>>>>>>> >>>>>>>>>>> Can you type zk_dump from hbase shell and let us the result? >>>>>>>>>>> >>>>>>>>>>> If the cluster is properly configured, you'll get something >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> like >>> >>> >>>> this: >>>>>> >>>>>> >>>>>>> ===================================== >>>>>>>>>>> hbase(main):007:0> zk_dump >>>>>>>>>>> >>>>>>>>>>> HBase tree in ZooKeeper is rooted at /hbase >>>>>>>>>>> Cluster up? true >>>>>>>>>>> In safe mode? false >>>>>>>>>>> Master address: 172.16.80.26:60000 >>>>>>>>>>> Region server holding ROOT: 172.16.80.27:60020 >>>>>>>>>>> Region servers: >>>>>>>>>>> - 172.16.80.27:60020 >>>>>>>>>>> - 172.16.80.29:60020 >>>>>>>>>>> - 172.16.80.28:60020 >>>>>>>>>>> ===================================== >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> one of my co-workers apparently can log into his box and >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> submit >>>> >>>> >>>>> jobs, >>>>>>>> >>>>>>>> >>>>>>>>> but >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> me or anyone else is still unable to log in. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Maybe you're a bit confused; your co-worker seems to be able >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> to >>> >>> >>>> use >>>> >>>> >>>>> Hadoop Map/Reduce, not HBase. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Does Hbase allow concurrent connections? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I think it also says the master is on port 60000 >>>>>>>>>>>>> when the install directions say its supposed to be 60010? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Port 60000 is correct. The master uses port 60000 to accept >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> connection >>>>>> >>>>>> >>>>>>> from hbase shell and region servers. Port 60010 is for the >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> web-based >>>>>> >>>>>> >>>>>>> HBase console. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> We tried applying this fix (to explicitly set the master): >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >>>>>> >>>>>> >>>>>>> No, this is an old way to configure a cluster. You shouldn't >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> use >>> >>> >>>> this >>>>>> >>>>>> >>>>>>> with HBase 0.20.x >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Tatsuya Kawano (Mr.) >>>>>>>>>>> Tokyo, Japan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Another interesting data point. We tried applying this fix >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> (to >>>> >>>> >>>>> explicitly >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> set the master): >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html >>>>>> >>>>>> >>>>>>> But when I log in to the master node, it takes really long >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> to >>> >>> >>>> submit >>>>>> >>>>>> >>>>>>> a >>>>>>>> >>>>>>>> >>>>>>>>> query >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> and I get this in response: >>>>>>>>>>>> hbase(main):001:0> list >>>>>>>>>>>> NativeException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Trying to contact region server null for region , row '', >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> but >>> >>> >>>> failed >>>>>> >>>>>> >>>>>>> after 5 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> attempts. >>>>>>>>>>>> Exceptions: >>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Timed >>>> >>>> >>>>> out >>>>>> >>>>>> >>>>>>> trying >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> to locate root region >>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Timed >>>> >>>> >>>>> out >>>>>> >>>>>> >>>>>>> trying >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> to locate root region >>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Timed >>>> >>>> >>>>> out >>>>>> >>>>>> >>>>>>> trying >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> to locate root region >>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Timed >>>> >>>> >>>>> out >>>>>> >>>>>> >>>>>>> trying >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> to locate root region >>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Timed >>>> >>>> >>>>> out >>>>>> >>>>>> >>>>>>> trying >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> to locate root region >>>>>>>>>>>> >>>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in >>>>>> >>>>>> >>>>>>> `getRegionServerWithRetries' >>>>>>>>>>>> from org/apache/hadoop/hbase/client/MetaScanner.java:55:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `metaScan' >>>>>>>> >>>>>>>> >>>>>>>>> from org/apache/hadoop/hbase/client/MetaScanner.java:28:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `metaScan' >>>>>>>> >>>>>>>> >>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/apache/hadoop/hbase/client/HConnectionManager.java:432:in >>>>>> >>>>>> >>>>>>> `listTables' >>>>>>>>>>>> from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `listTables' >>>>>>>> >>>>>>>> >>>>>>>>> from sun/reflect/NativeMethodAccessorImpl.java:-2:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `invoke0' >>>> >>>> >>>>> from sun/reflect/NativeMethodAccessorImpl.java:39:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `invoke' >>> >>> >>>> from sun/reflect/DelegatingMethodAccessorImpl.java:25:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `invoke' >>>>>> >>>>>> >>>>>>> from java/lang/reflect/Method.java:597:in `invoke' >>>>>>>>>>>> from org/jruby/javasupport/JavaMethod.java:298:in >>>>>>>>>>>> `invokeWithExceptionHandling' >>>>>>>>>>>> from org/jruby/javasupport/JavaMethod.java:259:in `invoke' >>>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/jruby/java/invokers/InstanceMethodInvoker.java:36:in >>> >>> >>>> `call' >>>>>>>> >>>>>>>> >>>>>>>>> from org/jruby/runtime/callsite/CachingCallSite.java:253:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `cacheAndCall' >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> from org/jruby/runtime/callsite/CachingCallSite.java:72:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `call' >>>>>> >>>>>> >>>>>>> from org/jruby/ast/CallNoArgNode.java:61:in `interpret' >>>>>>>>>>>> from org/jruby/ast/ForNode.java:104:in `interpret' >>>>>>>>>>>> ... 116 levels... >>>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in >>>> >>>> >>>>> `call' >>>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/jruby/internal/runtime/methods/DynamicMethod.java:226:in >>>>>> >>>>>> >>>>>>> `call' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/jruby/internal/runtime/methods/CompiledMethod.java:211:in >>>>>> >>>>>> >>>>>>> `call' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> org/jruby/internal/runtime/methods/CompiledMethod.java:71:in >>>>>> >>>>>> >>>>>>> `call' >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> from org/jruby/runtime/callsite/CachingCallSite.java:253:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `cacheAndCall' >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> from org/jruby/runtime/callsite/CachingCallSite.java:72:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `call' >>>>>> >>>>>> >>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in >>>> >>>> >>>>> `__file__' >>>>>>>>>>>> from >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in >>>>>> >>>>>> >>>>>>> `load' >>>>>>>>>>>> from org/jruby/Ruby.java:577:in `runScript' >>>>>>>>>>>> from org/jruby/Ruby.java:480:in `runNormally' >>>>>>>>>>>> from org/jruby/Ruby.java:354:in `runFromMain' >>>>>>>>>>>> from org/jruby/Main.java:229:in `run' >>>>>>>>>>>> from org/jruby/Main.java:110:in `run' >>>>>>>>>>>> from org/jruby/Main.java:94:in `main' >>>>>>>>>>>> from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> `list' >>> >>> >>>> from (hbase):2hbase(main):002:0> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> thanks for your response Sujee. These boxes are all on an >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> internal >>>>>> >>>>>> >>>>>>> DNS >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> they all resolve. >>>>>>>>>>>>> >>>>>>>>>>>>> one of my co-workers apparently can log into his box and >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> submit >>>> >>>> >>>>> jobs, >>>>>>>> >>>>>>>> >>>>>>>>> but >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> me or anyone else is still unable to log in. Does Hbase >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> allow >>>> >>>> >>>>> concurrent >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> connections? In Hive I remember having to configure the >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> metastore >>>>>> >>>>>> >>>>>>> to >>>>>>>> >>>>>>>> >>>>>>>>> be >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> server mode if multiple people were using it. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam < >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> [email protected] >>>> >>>> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> crunch2: Warning: Permanently added 'crunch2' (RSA) to >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>> >>> >>>> list >>>>>> >>>>>> >>>>>>> of >>>>>>>> >>>>>>>> >>>>>>>>> known >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> hosts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> is your SSH setup correctly? From master, you need to be >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> able >>>> >>>> >>>>> to >>>>>> >>>>>> >>>>>>> login to all slaves/regionservers without password >>>>>>>>>>>>>> >>>>>>>>>>>>>> And I see you are using short hostnames (crunch2, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> crunch3), >>> >>> >>>> do >>>> >>>> >>>>> they >>>>>>>> >>>>>>>> >>>>>>>>> all resolve correctly? or you need to update /etc/hosts >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> to >>> >>> >>>> resolve >>>>>>>> >>>>>>>> >>>>>>>>> these to an IP address on all machines. >>>>>>>>>>>>>> >>>>>>>>>>>>>> regards >>>>>>>>>>>>>> Sujee Maniyam >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> http://sujee.net >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >> >> >
