Re: HBase 0.20.1 Distributed Install Problems

Chris Bates Wed, 11 Nov 2009 00:52:57 -0800

Hi Lars,

By no logs I mean that when I ssh into any of the M2-M5 boxes and check the
logs folder, there is only zookeeper logs, no RS logs (see below).  The
permissions are ok.


This is what I see when I run start-hbase.sh -- I can ssh into any of the
boxes with no password just fine, it just gives me a weird first time host
message...we get the same thing when we start up hadoop.

had...@chanel2:/opt/hadoop/hbase-0.20.1$ bin/start-hbase.sh
crunch2: Warning: Permanently added '[crunch2]:2200,[172.16.1.95]:2200'
(RSA) to the list of known hosts.
chanel: Warning: Permanently added '[chanel]:2200,[172.16.1.45]:2200' (RSA)
to the list of known hosts.
chanel2: Warning: Permanently added '[chanel2]:2200,[172.16.1.46]:2200'
(RSA) to the list of known hosts.
chris: Warning: Permanently added '[chris]:2200,[172.16.1.83]:2200' (RSA) to
the list of known hosts.
crunch3: Warning: Permanently added '[crunch3]:2200,[172.16.1.96]:2200'
(RSA) to the list of known hosts.
chanel: starting zookeeper, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chanel.out
chanel2: starting zookeeper, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chanel2.out
chris: starting zookeeper, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-chris.out
crunch2: starting zookeeper, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-crunch2.out
crunch3: starting zookeeper, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-zookeeper-crunch3.out
starting master, logging to
/opt/hadoop/hbase-0.20.1/bin/../logs/hbase-hadoop-master-chanel2.out
crunch2: Warning: Permanently added '[crunch2]:2200,[172.16.1.95]:2200'
(RSA) to the list of known hosts.
crunch3: Warning: Permanently added '[crunch3]:2200,[172.16.1.96]:2200'
(RSA) to the list of known hosts.
chanel: Warning: Permanently added '[chanel]:2200,[172.16.1.45]:2200' (RSA)
to the list of known hosts.
chris: Warning: Permanently added '[chris]:2200,[172.16.1.83]:2200' (RSA) to
the list of known hosts.
crunch2: regionserver running as process 6950. Stop it first.
chanel: regionserver running as process 22200. Stop it first.
crunch3: regionserver running as process 28962. Stop it first.
chris: regionserver running as process 28719. Stop it first.


Here is the jstack from one of the boxes:

had...@chanel:/opt/hadoop/hbase-0.20.1$ jps
23777 TaskTracker
30449 Jps
23694 DataNode
26747 Main
22200 HRegionServer
30174 HQuorumPeer

had...@chanel:/opt/hadoop/hbase-0.20.1$ jstack 22200
2009-11-11 03:43:56
Full thread dump Java HotSpot(TM) Server VM (14.2-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x083f8000 nid=0x7709 waiting on
condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"main-EventThread" daemon prio=10 tid=0x6e586400 nid=0x56e3 waiting on
condition [0x6e4ad000]
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x73865330> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

"main-SendThread" daemon prio=10 tid=0x6e572400 nid=0x56e2 waiting on
condition [0x6e4fe000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
 at
org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:851)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:895)

"Low Memory Detector" daemon prio=10 tid=0x0813ac00 nid=0x56dd runnable
[0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x08139000 nid=0x56dc waiting on
condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x08136400 nid=0x56db waiting on
condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x08134c00 nid=0x56da runnable
[0x00000000]
   java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (CMS)" daemon prio=10 tid=0x08133400 nid=0x56d9
waiting on condition [0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x0811f800 nid=0x56d8 in Object.wait()
[0x6ec75000]
   java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
- waiting on <0x73860458> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x73860458> (a java.lang.ref.ReferenceQueue$Lock)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0811e400 nid=0x56d7 in
Object.wait() [0x6ecc6000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
 - waiting on <0x738657e0> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
 at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x738657e0> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x0805a800 nid=0x56d2 waiting on condition [0xb72f2000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
 at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:74)
at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:51)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.watchMasterAddress(HRegionServer.java:387)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.reinitializeZooKeeper(HRegionServer.java:315)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.reinitialize(HRegionServer.java:306)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:276)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at
org.apache.hadoop.hbase.regionserver.HRegionServer.doMain(HRegionServer.java:2472)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2540)

"VM Thread" prio=10 tid=0x0811a400 nid=0x56d6 runnable

"Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x0805e400 nid=0x56d3
runnable

"Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x0805fc00 nid=0x56d4
runnable

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x080cd800 nid=0x56d5
runnable
"VM Periodic Task Thread" prio=10 tid=0x0813cc00 nid=0x56de waiting on
condition

JNI global references: 691

had...@chanel:/opt/hadoop/hbase-0.20.1$ ls -l
total 3628
drwxr-xr-x 2 hadoop hadoop    4096 2009-11-10 21:41 bin
-rw-r--r-- 1 hadoop hadoop   21416 2009-11-10 21:41 build.xml
-rw-r--r-- 1 hadoop hadoop  115584 2009-11-10 21:41 CHANGES.txt
drwxr-xr-x 2 hadoop hadoop    4096 2009-11-11 02:00 conf
drwxr-xr-x 4 hadoop hadoop    4096 2009-11-10 21:41 contrib
drwxr-xr-x 5 hadoop hadoop    4096 2009-11-10 21:41 docs
-rw-r--r-- 1 hadoop hadoop 1544829 2009-11-10 21:41 hbase-0.20.1.jar
-rw-r--r-- 1 hadoop hadoop 1954331 2009-11-10 21:41 hbase-0.20.1-test.jar
drwxr-xr-x 4 hadoop hadoop    4096 2009-11-10 21:41 lib
-rw-r--r-- 1 hadoop hadoop   11358 2009-11-10 21:41 LICENSE.txt
drwxr-xr-x 2 hadoop hadoop    4096 2009-11-11 03:38 logs
-rw-r--r-- 1 hadoop hadoop    1741 2009-11-10 21:41 NOTICE.txt
-rw-r--r-- 1 hadoop hadoop      43 2009-11-10 21:41 README.txt
drwxr-xr-x 8 hadoop hadoop    4096 2009-11-10 21:41 src
drwxr-xr-x 6 hadoop hadoop    4096 2009-11-10 21:41 webapps

had...@chanel:/opt/hadoop/hbase-0.20.1$ cd logs/
had...@chanel:/opt/hadoop/hbase-0.20.1/logs$ ll
total 72
-rw-r--r-- 1 hadoop hadoop 66759 2009-11-11 03:38
hbase-hadoop-zookeeper-chanel.log
-rw-r--r-- 1 hadoop hadoop     0 2009-11-11 03:38
hbase-hadoop-zookeeper-chanel.out
-rw-r--r-- 1 hadoop hadoop     0 2009-11-11 03:00
hbase-hadoop-zookeeper-chanel.out.1
-rw-r--r-- 1 hadoop hadoop     0 2009-11-11 02:56
hbase-hadoop-zookeeper-chanel.out.2
-rw-r--r-- 1 hadoop hadoop     0 2009-11-11 02:36
hbase-hadoop-zookeeper-chanel.out.3
-rw-r--r-- 1 hadoop hadoop     0 2009-11-11 02:18
hbase-hadoop-zookeeper-chanel.out.4




On Wed, Nov 11, 2009 at 3:15 AM, Lars George <[email protected]> wrote:

> Chris,
>
> What do you mean there are no region server logs? On the M2-M5 you have no
> logs? Is the Java process for the RS running? If so, could you jstck it to
> see where it hangs?
>
> Maybe you have an access/owner issue with the log dirs on the RS machines?
>
> The master log looks OK.
>
> Lars
>
> Chris Bates schrieb:
>
>> Again, I really appreciate the help.  I removed the master from the region
>> server list and made sure the rest of the machines had an updated list.
>>  No
>> region servers still:
>> hbase(main):001:0> zk_dump
>>
>> HBase tree in ZooKeeper is rooted at /hbase
>>  Cluster up? true
>>  In safe mode? true
>>  Master address: 172.16.1.46:60000
>>  Region server holding ROOT: 172.16.1.46:60020
>>  Region servers:
>>
>> hbase(main):002:0> status 'simple'
>> 0 live servers
>> 0 dead servers
>>
>> I checked the /etc/hosts file on all machines and they all have 127.0.0.1
>> localhost.localdomain localhost and then their other mappings for other
>> domains, with the box name mapping was removed.
>>
>> There are no regionserver logs.  But the master log is this:
>> 2009-11-11 03:02:34,798 INFO org.apache.hadoop.hbase.master.RegionManager:
>> -ROOT- region unset (but not set to be reassigned)
>> 2009-11-11 03:02:34,799 INFO org.apache.hadoop.hbase.master.RegionManager:
>> ROOT inserted into regionsInTransition
>> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Attempting
>> connection to server chanel2/172.16.1.46:2181
>> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Priming
>> connection to java.nio.channels.SocketChannel[connected local=/
>> 172.16.1.46:53335 remote=chanel2/172.16.1.46:2181]
>> 2009-11-11 03:02:35,078 INFO org.apache.zookeeper.ClientCnxn: Server
>> connection successful
>> 2009-11-11 03:02:35,179 INFO org.apache.hadoop.hbase.master.HMaster:
>> HMaster
>> initialized on 172.16.1.46:60000
>> 2009-11-11 03:02:35,197 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=Master, sessionId=HMaster
>> 2009-11-11 03:02:35,198 INFO
>> org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
>> 2009-11-11 03:02:35,373 INFO org.apache.hadoop.http.HttpServer: Port
>> returned by webServer.getConnectors()[0].getLocalPort() before open() is
>> -1.
>> Opening the listener on 60010
>> 2009-11-11 03:02:35,374 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 60010
>> webServer.getConnectors()[0].getLocalPort() returned 60010
>> 2009-11-11 03:02:35,374 INFO org.apache.hadoop.http.HttpServer: Jetty
>> bound
>> to port 60010
>> 2009-11-11 03:02:52,692 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> Responder: starting
>> 2009-11-11 03:02:52,693 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> listener on 60000: starting
>> 2009-11-11 03:02:52,695 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 0 on 60000: starting
>> 2009-11-11 03:02:52,695 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 1 on 60000: starting
>> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 2 on 60000: starting
>> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 3 on 60000: starting
>> 2009-11-11 03:02:52,696 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 4 on 60000: starting
>> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 5 on 60000: starting
>> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 6 on 60000: starting
>> 2009-11-11 03:02:52,697 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 7 on 60000: starting
>> 2009-11-11 03:02:52,698 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 8 on 60000: starting
>> 2009-11-11 03:02:52,698 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
>> handler 9 on 60000: starting
>> 2009-11-11 03:03:34,719 INFO org.apache.hadoop.hbase.master.ServerManager:
>> 0
>> region servers, 0 dead, average load NaN
>> 2009-11-11 03:03:35,200 INFO org.apache.hadoop.hbase.master.BaseScanner:
>> All
>> 0 .META. region(s) scanned
>>
>>
>>
>> On Wed, Nov 11, 2009 at 2:39 AM, Jeff Zhang <[email protected]> wrote:
>>
>>
>>
>>> Hi Jean,
>>>
>>> Thank you, after I remove the mapping from sha-cs-03 stuff to localhost
>>> it
>>> works.
>>>
>>> But I installed hadoop successfully on these machines before, is hbase
>>> different from hadoop about the ip mapping ?
>>>
>>>
>>> Jeff Zhang
>>>
>>>
>>>
>>> On Wed, Nov 11, 2009 at 1:29 PM, Jean-Daniel Cryans <[email protected]
>>>
>>>
>>>> wrote:
>>>>      Check your OS networking configuration, make sure stuff don't
>>>> resolves
>>>> to localhost or 127.0.0.1 or 127.0.1.1
>>>>
>>>> Also you said you can't run the list, what does it do then?
>>>>
>>>> J-D
>>>>
>>>> On Tue, Nov 10, 2009 at 9:23 PM, Jeff Zhang <[email protected]> wrote:
>>>>
>>>>
>>>>> *I configure the regionservers in the file regsionservers as
>>>>>
>>>>>
>>>> following:*
>>>
>>>
>>>> sha-cs-01
>>>>> sha-cs-02
>>>>> sha-cs-03
>>>>> sha-cs-05
>>>>> sha-cs-06
>>>>>
>>>>> *And also I configure the zookeeper in file hbase-site.xml as
>>>>>
>>>>>
>>>> following:*
>>>
>>>
>>>> <configuration>
>>>>>  <property>
>>>>>   <name>hbase.cluster.distributed</name>
>>>>>   <value>true</value>
>>>>>   <description>The mode the cluster will be in. Possible values are
>>>>>     false: standalone and pseudo-distributed setups with managed
>>>>>
>>>>>
>>>> Zookeeper
>>>>
>>>>
>>>>>     true: fully-distributed with unmanaged Zookeeper Quorum (see
>>>>> hbase-env.sh)
>>>>>   </description>
>>>>>  </property>
>>>>>  <property>
>>>>>     <name>hbase.zookeeper.property.clientPort</name>
>>>>>     <value>2222</value>
>>>>>     <description>Property from ZooKeeper's config zoo.cfg.
>>>>>     The port at which the clients will connect.
>>>>>     </description>
>>>>>   </property>
>>>>>  <property>
>>>>>     <name>hbase.zookeeper.quorum</name>
>>>>>     <value>*sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-04,sha-cs-06*</value>
>>>>>     <description>Comma separated list of servers in the ZooKeeper
>>>>>
>>>>>
>>>> Quorum.
>>>>
>>>>
>>>>>     For example, "host1.mydomain.com,host2.mydomain.com,
>>>>>
>>>>>
>>>> host3.mydomain.com
>>>>
>>>>
>>>>> ".
>>>>>     By default this is set to localhost for local and
>>>>>
>>>>>
>>>> pseudo-distributed
>>>
>>>
>>>> modes
>>>>>     of operation. For a fully-distributed setup, this should be set to
>>>>>
>>>>>
>>>> a
>>>
>>>
>>>> full
>>>>>     list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>>>>> hbase-env.sh
>>>>>     this is the list of servers which we will start/stop ZooKeeper on.
>>>>>     </description>
>>>>>  </property>
>>>>>  <property>
>>>>>   <name>hbase.rootdir</name>
>>>>>   <value>hdfs://sha-cs-04:9000/hbase</value>
>>>>>   <description>The directory shared by region servers.
>>>>>   </description>
>>>>>  </property>
>>>>>
>>>>> </configuration>
>>>>>
>>>>>
>>>>> I still do not understand what's wrong with my configuration ?
>>>>>
>>>>>
>>>>> Jeff Zhang
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 11, 2009 at 12:56 PM, Jean-Daniel Cryans <
>>>>>
>>>>>
>>>> [email protected]>wrote:
>>>>
>>>>
>>>>> Please read my answer to Chris (wrote about 10-15 minutes ago), you
>>>>>> also seem to confuse regionservers and zookeeper quorum members.
>>>>>>
>>>>>> In this case it also seems some region servers registered themselves
>>>>>> as localhost and then with their good address the master probably gave
>>>>>> them. Please check your OS network configurations and make sure the
>>>>>> hostname points at the right place.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Nov 10, 2009 at 8:47 PM, Jeff Zhang <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>> Hi Jean,
>>>>>>>
>>>>>>> I try the hbase 0.20.2, I look the logs, it seems the master the
>>>>>>>
>>>>>>>
>>>>>> regions
>>>>
>>>>
>>>>> works.
>>>>>>>
>>>>>>> But I can not run list command on hbase shell. When I invoke command
>>>>>>>
>>>>>>>
>>>>>> status
>>>>>>
>>>>>>
>>>>>>> 'simple' on hbase shell. It shows the following message:
>>>>>>> 09/11/11 12:42:55 DEBUG client.HConnectionManager$ClientZKWatcher:
>>>>>>>
>>>>>>>
>>>>>> Got
>>>
>>>
>>>>  ZooKeeper event, state: SyncConnected, type: None, path: null
>>>>>>> 09/11/11 12:42:55 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode
>>>>>>>
>>>>>>>
>>>>>> /hbase/master
>>>>>>
>>>>>>
>>>>>>> got 10.148.224.13:60000
>>>>>>> 8 servers, 0 dead, 0.1250 average load
>>>>>>> hbase(main):002:0> status 'simple'
>>>>>>> 8 live servers
>>>>>>>   localhost:60020 1257914319445
>>>>>>>       requests=0, regions=0, usedHeap=0, maxHeap=0
>>>>>>>   sha-cs-03:60020 1257914321331
>>>>>>>       requests=0, regions=0, usedHeap=33, maxHeap=991
>>>>>>>   localhost:60020 1257914320265
>>>>>>>       requests=0, regions=0, usedHeap=0, maxHeap=0
>>>>>>>   sha-cs-01:60020 1257914320551
>>>>>>>       requests=0, regions=1, usedHeap=34, maxHeap=991
>>>>>>>   sha-cs-05:60020 1257914322656
>>>>>>>       requests=0, regions=0, usedHeap=33, maxHeap=991
>>>>>>>   sha-cs-06:60020 1257914321467
>>>>>>>       requests=0, regions=0, usedHeap=34, maxHeap=991
>>>>>>>   localhost:60020 1257914320202
>>>>>>>       requests=0, regions=0, usedHeap=0, maxHeap=0
>>>>>>>   localhost:60020 1257914321532
>>>>>>>       requests=0, regions=0, usedHeap=0, maxHeap=0
>>>>>>>
>>>>>>>
>>>>>>> It's weired that why here I have 3 localhost zookeeper, actually I
>>>>>>>
>>>>>>>
>>>>>> set
>>>
>>>
>>>> 5
>>>>
>>>>
>>>>> machines on hbase.zookeeper.quorum
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 11, 2009 at 9:47 AM, Jean-Daniel Cryans <
>>>>>>>
>>>>>>>
>>>>>> [email protected]
>>>>
>>>>
>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> This particular problem is fixed in the current 0.20 branch and we
>>>>>>>> just released a candidate for 0.20.2, you can get it here
>>>>>>>> http://people.apache.org/~jdcryans/hbase-0.20.2-candidate-1/<
>>>>>>>>
>>>>>>>>
>>>>>>> http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/>
>>>
>>>
>>>> <http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/>
>>>>
>>>>
>>>>> <http://people.apache.org/%7Ejdcryans/hbase-0.20.2-candidate-1/>
>>>>>>
>>>>>>
>>>>>>> J-D
>>>>>>>>
>>>>>>>> On Tue, Nov 10, 2009 at 5:43 PM, Jeff Zhang <[email protected]>
>>>>>>>>
>>>>>>>>
>>>>>>> wrote:
>>>>
>>>>
>>>>>  The following is the region server's log :
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2009-11-10 18:09:08,062 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 3 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 4 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 5 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 6 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 7 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 8 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,063 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: HRegionServer
>>>>>>>>>
>>>>>>>>>
>>>>>>>> started
>>>>>>
>>>>>>
>>>>>>> at: 10.148.224.11:60020
>>>>>>>>> 2009-11-10 18:09:08,064 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 9 on 60020: starting
>>>>>>>>> 2009-11-10 18:09:08,070 INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.regionserver.StoreFile:
>>>>>>>>
>>>>>>>>
>>>>>>>>> Allocating LruBlockCache with maximum size 198.3m
>>>>>>>>> 2009-11-10 18:09:08,095 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> MSG_CALL_SERVER_STARTUP
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2009-11-10 18:09:08,229 INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.regionserver.HLog:
>>>>>>
>>>>>>
>>>>>>> HLog
>>>>>>>>
>>>>>>>>
>>>>>>>>> configuration: blocksize=67108864, rollsize=63753420,
>>>>>>>>>
>>>>>>>>>
>>>>>>>> enabled=true,
>>>
>>>
>>>>  flushlogentries=100, optionallogflushinternal=10000ms
>>>>>>>>> 2009-11-10 18:09:08,253 INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.regionserver.HLog:
>>>>>>
>>>>>>
>>>>>>> New
>>>>>>>>
>>>>>>>>
>>>>>>>>> hlog /hbase/.logs/10.148.224.11
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ,60020,1257847748205/hlog.dat.1257847748229
>>>>>>>>
>>>>>>>>
>>>>>>>>> 2009-11-10 18:09:08,255 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Telling
>>>>>>>>>
>>>>>>>>>
>>>>>>>> master
>>>
>>>
>>>> at
>>>>
>>>>
>>>>>  10.148.224.13:60000 that we are up
>>>>>>>>> 2009-11-10 18:09:08,302 FATAL
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled
>>>>>>>>>
>>>>>>>>>
>>>>>>>> exception.
>>>>>>
>>>>>>
>>>>>>> Aborting...
>>>>>>>>> java.lang.NullPointerException
>>>>>>>>>       at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:459)
>>>
>>>
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>>>>> 2009-11-10 18:09:08,304 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
>>>>>>>>>
>>>>>>>>>
>>>>>>>> metrics:
>>>>
>>>>
>>>>>  request=0.0, regions=0, stores=0, storefiles=0,
>>>>>>>>>
>>>>>>>>>
>>>>>>>> storefileIndexSize=0,
>>>>
>>>>
>>>>>  memstoreSize=0, usedHeap=31, maxHeap=99
>>>>>>>>> 1, blockCacheSize=1707288, blockCacheFree=206264664,
>>>>>>>>>
>>>>>>>>>
>>>>>>>> blockCacheCount=0,
>>>>>>
>>>>>>
>>>>>>> blockCacheHitRatio=0
>>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Stopping
>>>>>>
>>>>>>
>>>>>>> server on 60020
>>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 0 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Stopping
>>>>>>
>>>>>>
>>>>>>> IPC
>>>>>>>>
>>>>>>>>
>>>>>>>>> Server listener on 60020
>>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 1 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,304 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 2 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 3 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 4 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 5 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 6 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 7 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 8 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,305 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> IPC
>>>
>>>
>>>>  Server
>>>>>>>>
>>>>>>>>
>>>>>>>>> handler 9 on 60020: exiting
>>>>>>>>> 2009-11-10 18:09:08,306 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
>>>>>>>>>
>>>>>>>>>
>>>>>>>> infoServer
>>>>>>
>>>>>>
>>>>>>> 2009-11-10 18:09:08,307 INFO org.apache.hadoop.ipc.HBaseServer:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Stopping
>>>>>>
>>>>>>
>>>>>>> IPC
>>>>>>>>
>>>>>>>>
>>>>>>>>> Server Responder
>>>>>>>>> 2009-11-10 18:09:08,412 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
>>>>>>>>> regionserver/127.0.0.1:60020.cacheFlusher exiting
>>>>>>>>> 2009-11-10 18:09:08,412 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.LogFlusher:
>>>>>>>>> regionserver/127.0.0.1:60020.logFlusher exiting
>>>>>>>>> 2009-11-10 18:09:08,412 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>>>>>>>>> regionserver/127.0.0.1:60020.compactor exiting
>>>>>>>>> 2009-11-10 18:09:08,412 INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>> org.apache.hadoop.hbase.regionserver.LogRoller:
>>>>>>>>
>>>>>>>>
>>>>>>>>> LogRoller exiting.
>>>>>>>>> 2009-11-10 18:09:08,413 INFO
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChecker:
>>>
>>>
>>>>  regionserver/127.0.0.1:60020.majorCompactionChecker exiting
>>>>>>>>> 2009-11-10 18:09:08,427 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: On abort,
>>>>>>>>>
>>>>>>>>>
>>>>>>>> closed
>>>>
>>>>
>>>>> hlog
>>>>>>
>>>>>>
>>>>>>> 2009-11-10 18:09:08,428 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: aborting
>>>>>>>>>
>>>>>>>>>
>>>>>>>> server
>>>
>>>
>>>> at:
>>>>>>
>>>>>>
>>>>>>> 10.148.224.11:60020
>>>>>>>>> 2009-11-10 18:09:17,489 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: worker thread
>>>>>>>>>
>>>>>>>>>
>>>>>>>> exiting
>>>>>>
>>>>>>
>>>>>>> 2009-11-10 18:09:17,489 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Closing
>>>>
>>>>
>>>>>  session: 0x324dcceb05c0003
>>>>>>>>> 2009-11-10 18:09:17,490 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Closing
>>>>
>>>>
>>>>>  ClientCnxn for session: 0x324dcceb05c0003
>>>>>>>>> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
>>>>>>>>> regionserver/127.0.0.1:60020.leaseChecker closing leases
>>>>>>>>> 2009-11-10 18:09:17,495 INFO org.apache.hadoop.hbase.Leases:
>>>>>>>>> regionserver/127.0.0.1:60020.leaseChecker closed leases
>>>>>>>>> 2009-11-10 18:09:17,500 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Exception
>>>>>>
>>>>>>
>>>>>>> while closing send thread for session 0x324dcceb05c0003 : Read
>>>>>>>>>
>>>>>>>>>
>>>>>>>> error
>>>>
>>>>
>>>>> rc =
>>>>>>
>>>>>>
>>>>>>> -1
>>>>>>>>
>>>>>>>>
>>>>>>>>> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>>>>>>>>> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Disconnecting
>>>>>>>>
>>>>>>>>
>>>>>>>>> ClientCnxn for session: 0x324dcceb05c0003
>>>>>>>>> 2009-11-10 18:09:17,604 INFO org.apache.zookeeper.ZooKeeper:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Session:
>>>>
>>>>
>>>>>  0x324dcceb05c0003 closed
>>>>>>>>> 2009-11-10 18:09:17,605 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver/
>>>>>>>>> 127.0.0.1:60020 exiting
>>>>>>>>> 2009-11-10 18:09:17,605 INFO org.apache.zookeeper.ClientCnxn:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> EventThread
>>>>>>
>>>>>>
>>>>>>> shut down
>>>>>>>>> 2009-11-10 18:09:17,606 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Starting
>>>>>>>>>
>>>>>>>>>
>>>>>>>> shutdown
>>>>
>>>>
>>>>>  thread.
>>>>>>>>> 2009-11-10 18:09:17,606 INFO
>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutdown
>>>>>>>>>
>>>>>>>>>
>>>>>>>> thread
>>>
>>>
>>>>  complete
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Tue, Nov 10, 2009 at 10:55 PM, Andrew Purtell <
>>>>>>>>>
>>>>>>>>>
>>>>>>>> [email protected]
>>>>
>>>>
>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> When you try to start the region servers, what do you see in the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> log?
>>>>
>>>>
>>>>>  If you don't change the client port
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> (hbase.zookeeper.property.clientPort),
>>>>>>>>
>>>>>>>>
>>>>>>>>> does it work?
>>>>>>>>>>
>>>>>>>>>>    - Andy
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ________________________________
>>>>>>>>>> From: Jeff Zhang <[email protected]>
>>>>>>>>>> To: [email protected]
>>>>>>>>>> Sent: Tue, November 10, 2009 2:40:28 PM
>>>>>>>>>> Subject: Re: HBase 0.20.1 Distributed Install Problems
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I meet the same problem that I can not start the regionserver.
>>>>>>>>>>
>>>>>>>>>> When I invoke zk_dump
>>>>>>>>>>
>>>>>>>>>> it shows:
>>>>>>>>>>
>>>>>>>>>> HBase tree in ZooKeeper is rooted at /hbase
>>>>>>>>>>  Cluster up? true
>>>>>>>>>>  In safe mode? true
>>>>>>>>>>  Master address: 10.148.224.13:60000
>>>>>>>>>>  Region server holding ROOT: null
>>>>>>>>>>  Region servers:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The following is my hbase-site.xml
>>>>>>>>>>
>>>>>>>>>> <configuration>
>>>>>>>>>>  <property>
>>>>>>>>>>   <name>hbase.cluster.distributed</name>
>>>>>>>>>>   <value>true</value>
>>>>>>>>>>   <description>The mode the cluster will be in. Possible values
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> are
>>>>
>>>>
>>>>>      false: standalone and pseudo-distributed setups with
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> managed
>>>
>>>
>>>>  Zookeeper
>>>>>>>>
>>>>>>>>
>>>>>>>>>     true: fully-distributed with unmanaged Zookeeper Quorum
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> (see
>>>
>>>
>>>>  hbase-env.sh)
>>>>>>>>>>   </description>
>>>>>>>>>>  </property>
>>>>>>>>>>  <property>
>>>>>>>>>>   <name>hbase.rootdir</name>
>>>>>>>>>>   <value>hdfs://sha-cs-04:9000/hbase</value>
>>>>>>>>>>   <description>The directory shared by region servers.
>>>>>>>>>>   </description>
>>>>>>>>>>  </property>
>>>>>>>>>>  <property>
>>>>>>>>>>     <name>hbase.zookeeper.property.clientPort</name>
>>>>>>>>>>     <value>2222</value>
>>>>>>>>>>     <description>Property from ZooKeeper's config zoo.cfg.
>>>>>>>>>>     The port at which the clients will connect.
>>>>>>>>>>     </description>
>>>>>>>>>>  </property>
>>>>>>>>>>  <property>
>>>>>>>>>>     <name>hbase.zookeeper.quorum</name>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>  <value>sha-cs-01,sha-cs-02,sha-cs-03,sha-cs-05,sha-cs-06</value>
>>>>
>>>>
>>>>>      <description>Comma separated list of servers in the
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ZooKeeper
>>>
>>>
>>>>  Quorum.
>>>>>>>>
>>>>>>>>
>>>>>>>>>     For example, "host1.mydomain.com,host2.mydomain.com,
>>>>>>>>>> host3.mydomain.com
>>>>>>>>>> ".
>>>>>>>>>>     By default this is set to localhost for local and
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> pseudo-distributed
>>>>>>>>
>>>>>>>>
>>>>>>>>> modes
>>>>>>>>>>     of operation. For a fully-distributed setup, this should be
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> set
>>>>
>>>>
>>>>> to
>>>>>>
>>>>>>
>>>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>>>> full
>>>>>>>>>>     list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> set
>>>
>>>
>>>> in
>>>>
>>>>
>>>>>  hbase-env.sh
>>>>>>>>>>     this is the list of servers which we will start/stop
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> ZooKeeper
>>>>
>>>>
>>>>> on.
>>>>>>
>>>>>>
>>>>>>>      </description>
>>>>>>>>>>   </property>
>>>>>>>>>>
>>>>>>>>>> </configuration>
>>>>>>>>>>
>>>>>>>>>> What's wrong with my configuration ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you in advance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Jeff Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Nov 10, 2009 at 12:47 PM, Tatsuya Kawano
>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> It looks like the master and the region servers are cannot
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> locate
>>>>
>>>>
>>>>> each
>>>>>>
>>>>>>
>>>>>>>  other. HBase 0.20.x uses ZooKeeper (zk) to locate other
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> cluster
>>>
>>>
>>>>  members, so maybe your zk has wrong information.
>>>>>>>>>>>
>>>>>>>>>>> Can you type zk_dump from hbase shell and let us the result?
>>>>>>>>>>>
>>>>>>>>>>> If the cluster is properly configured, you'll get something
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> like
>>>
>>>
>>>> this:
>>>>>>
>>>>>>
>>>>>>>  =====================================
>>>>>>>>>>> hbase(main):007:0> zk_dump
>>>>>>>>>>>
>>>>>>>>>>> HBase tree in ZooKeeper is rooted at /hbase
>>>>>>>>>>>  Cluster up? true
>>>>>>>>>>>  In safe mode? false
>>>>>>>>>>>  Master address: 172.16.80.26:60000
>>>>>>>>>>>  Region server holding ROOT: 172.16.80.27:60020
>>>>>>>>>>>  Region servers:
>>>>>>>>>>>  - 172.16.80.27:60020
>>>>>>>>>>>  - 172.16.80.29:60020
>>>>>>>>>>>  - 172.16.80.28:60020
>>>>>>>>>>> =====================================
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> one of my co-workers apparently can log into his box and
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> submit
>>>>
>>>>
>>>>>  jobs,
>>>>>>>>
>>>>>>>>
>>>>>>>>> but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> me or anyone else is still unable to log in.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Maybe you're a bit confused; your co-worker seems to be able
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> to
>>>
>>>
>>>> use
>>>>
>>>>
>>>>>  Hadoop Map/Reduce, not HBase.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Does Hbase allow concurrent connections?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I think it also says the master is on port 60000
>>>>>>>>>>>>> when the install directions say its supposed to be 60010?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Port 60000 is correct. The master uses port 60000 to accept
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> connection
>>>>>>
>>>>>>
>>>>>>>  from hbase shell and region servers. Port 60010 is for the
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> web-based
>>>>>>
>>>>>>
>>>>>>>  HBase console.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> We tried applying this fix (to explicitly set the master):
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>>>>>>
>>>>>>
>>>>>>>  No, this is an old way to configure a cluster. You shouldn't
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> use
>>>
>>>
>>>> this
>>>>>>
>>>>>>
>>>>>>>  with HBase 0.20.x
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Tatsuya Kawano (Mr.)
>>>>>>>>>>> Tokyo, Japan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 10, 2009 at 1:10 PM, Chris Bates
>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Another interesting data point.  We tried applying this fix
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> (to
>>>>
>>>>
>>>>>  explicitly
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> set the master):
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> http://osdir.com/ml/hbase-user-hadoop-apache/2009-05/msg00321.html
>>>>>>
>>>>>>
>>>>>>>  But when I log in to the master node, it takes really long
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>
>>>
>>>> submit
>>>>>>
>>>>>>
>>>>>>> a
>>>>>>>>
>>>>>>>>
>>>>>>>>> query
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> and I get this in response:
>>>>>>>>>>>> hbase(main):001:0> list
>>>>>>>>>>>> NativeException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Trying to contact region server null for region , row '',
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> but
>>>
>>>
>>>> failed
>>>>>>
>>>>>>
>>>>>>>  after 5
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> attempts.
>>>>>>>>>>>> Exceptions:
>>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Timed
>>>>
>>>>
>>>>> out
>>>>>>
>>>>>>
>>>>>>>  trying
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> to locate root region
>>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Timed
>>>>
>>>>
>>>>> out
>>>>>>
>>>>>>
>>>>>>>  trying
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> to locate root region
>>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Timed
>>>>
>>>>
>>>>> out
>>>>>>
>>>>>>
>>>>>>>  trying
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> to locate root region
>>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Timed
>>>>
>>>>
>>>>> out
>>>>>>
>>>>>>
>>>>>>>  trying
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> to locate root region
>>>>>>>>>>>> org.apache.hadoop.hbase.client.NoServerForRegionException:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Timed
>>>>
>>>>
>>>>> out
>>>>>>
>>>>>>
>>>>>>>  trying
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> to locate root region
>>>>>>>>>>>>
>>>>>>>>>>>> from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/apache/hadoop/hbase/client/HConnectionManager.java:1001:in
>>>>>>
>>>>>>
>>>>>>>  `getRegionServerWithRetries'
>>>>>>>>>>>>  from org/apache/hadoop/hbase/client/MetaScanner.java:55:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `metaScan'
>>>>>>>>
>>>>>>>>
>>>>>>>>>  from org/apache/hadoop/hbase/client/MetaScanner.java:28:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `metaScan'
>>>>>>>>
>>>>>>>>
>>>>>>>>>   from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/apache/hadoop/hbase/client/HConnectionManager.java:432:in
>>>>>>
>>>>>>
>>>>>>>  `listTables'
>>>>>>>>>>>> from org/apache/hadoop/hbase/client/HBaseAdmin.java:127:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `listTables'
>>>>>>>>
>>>>>>>>
>>>>>>>>>   from sun/reflect/NativeMethodAccessorImpl.java:-2:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `invoke0'
>>>>
>>>>
>>>>>  from sun/reflect/NativeMethodAccessorImpl.java:39:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `invoke'
>>>
>>>
>>>>   from sun/reflect/DelegatingMethodAccessorImpl.java:25:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `invoke'
>>>>>>
>>>>>>
>>>>>>>  from java/lang/reflect/Method.java:597:in `invoke'
>>>>>>>>>>>>  from org/jruby/javasupport/JavaMethod.java:298:in
>>>>>>>>>>>> `invokeWithExceptionHandling'
>>>>>>>>>>>> from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
>>>>>>>>>>>>  from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/jruby/java/invokers/InstanceMethodInvoker.java:36:in
>>>
>>>
>>>>  `call'
>>>>>>>>
>>>>>>>>
>>>>>>>>>  from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `cacheAndCall'
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  from org/jruby/runtime/callsite/CachingCallSite.java:72:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `call'
>>>>>>
>>>>>>
>>>>>>>  from org/jruby/ast/CallNoArgNode.java:61:in `interpret'
>>>>>>>>>>>>  from org/jruby/ast/ForNode.java:104:in `interpret'
>>>>>>>>>>>> ... 116 levels...
>>>>>>>>>>>> from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb#start:-1:in
>>>>
>>>>
>>>>>  `call'
>>>>>>>>>>>>  from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/jruby/internal/runtime/methods/DynamicMethod.java:226:in
>>>>>>
>>>>>>
>>>>>>>  `call'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/jruby/internal/runtime/methods/CompiledMethod.java:211:in
>>>>>>
>>>>>>
>>>>>>>  `call'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> org/jruby/internal/runtime/methods/CompiledMethod.java:71:in
>>>>>>
>>>>>>
>>>>>>>  `call'
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> from org/jruby/runtime/callsite/CachingCallSite.java:253:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `cacheAndCall'
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  from org/jruby/runtime/callsite/CachingCallSite.java:72:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `call'
>>>>>>
>>>>>>
>>>>>>>  from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:497:in
>>>>
>>>>
>>>>>  `__file__'
>>>>>>>>>>>>  from
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> opt/hadoop/hbase_minus_0_dot_20_dot_1/bin/$_dot_dot_/bin/hirb.rb:-1:in
>>>>>>
>>>>>>
>>>>>>>  `load'
>>>>>>>>>>>> from org/jruby/Ruby.java:577:in `runScript'
>>>>>>>>>>>>  from org/jruby/Ruby.java:480:in `runNormally'
>>>>>>>>>>>> from org/jruby/Ruby.java:354:in `runFromMain'
>>>>>>>>>>>>  from org/jruby/Main.java:229:in `run'
>>>>>>>>>>>> from org/jruby/Main.java:110:in `run'
>>>>>>>>>>>>  from org/jruby/Main.java:94:in `main'
>>>>>>>>>>>> from /opt/hadoop/hbase-0.20.1/bin/../bin/hirb.rb:338:in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> `list'
>>>
>>>
>>>>   from (hbase):2hbase(main):002:0>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Nov 9, 2009 at 10:52 PM, Chris Bates <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> thanks for your response Sujee.  These boxes are all on an
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> internal
>>>>>>
>>>>>>
>>>>>>>  DNS
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> they all resolve.
>>>>>>>>>>>>>
>>>>>>>>>>>>> one of my co-workers apparently can log into his box and
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> submit
>>>>
>>>>
>>>>>  jobs,
>>>>>>>>
>>>>>>>>
>>>>>>>>> but
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> me or anyone else is still unable to log in.  Does Hbase
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> allow
>>>>
>>>>
>>>>>  concurrent
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> connections?  In Hive I remember having to configure the
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> metastore
>>>>>>
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> server mode if multiple people were using it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Nov 9, 2009 at 10:13 PM, Sujee Maniyam <
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> [email protected]
>>>>
>>>>
>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  [had...@crunch hbase-0.20.1]$ bin/start-hbase.sh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> crunch2: Warning: Permanently added 'crunch2' (RSA) to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the
>>>
>>>
>>>> list
>>>>>>
>>>>>>
>>>>>>> of
>>>>>>>>
>>>>>>>>
>>>>>>>>> known
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  hosts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> is your SSH setup correctly?  From master, you need to be
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> able
>>>>
>>>>
>>>>> to
>>>>>>
>>>>>>
>>>>>>>  login to all slaves/regionservers without password
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And I see you are using short hostnames (crunch2,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> crunch3),
>>>
>>>
>>>> do
>>>>
>>>>
>>>>>  they
>>>>>>>>
>>>>>>>>
>>>>>>>>>  all resolve correctly?  or you need to update /etc/hosts
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> to
>>>
>>>
>>>>  resolve
>>>>>>>>
>>>>>>>>
>>>>>>>>>  these to an IP address on all machines.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>> Sujee Maniyam
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> http://sujee.net
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>
>>
>

Re: HBase 0.20.1 Distributed Install Problems

Reply via email to