Does HBase/Hadoop have to run on kernel 2.6.27 or jdk 1.6? It seems only that kernel provides epoll resource configuration.
This is the first time I saw this error when I use machines with less resource for zookeeper. Probably, I should change it back. zhenyu On Fri, Nov 13, 2009 at 4:37 PM, Zhenyu Zhong <[email protected]>wrote: > The ulimit file descriptors was set to fs.file-max =1578334, also in the > limits.conf the value was set to 32768. > So these are way higher than the open descriptors for the running > processes. > > thanks > zhenyu > > > > On Fri, Nov 13, 2009 at 4:33 PM, Stack <[email protected]> wrote: > >> You upped the ulimit file descriptors as per the getting started doc? >> >> >> >> On Nov 13, 2009, at 1:26 PM, Zhenyu Zhong <[email protected]> >> wrote: >> >> Thanks a lot. >>> >>> >>> Bad news is my kernel is still 2.6.26. >>> But it was not a problem before. >>> >>> Very strange. >>> >>> zhenyu >>> >>> On Fri, Nov 13, 2009 at 4:16 PM, Jean-Daniel Cryans <[email protected] >>> >wrote: >>> >>> Looks like >>>> >>>> http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/ >>>> >>>> J-D >>>> >>>> On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> >>>>> After I re-organize the cluster, the experiment ran into problem faster >>>>> >>>> than >>>> >>>>> before. >>>>> >>>>> Basically the changes are to use machines with less resources as >>>>> >>>> zookeeper >>>> >>>>> quorums and machines with more resources as regionserver. >>>>> >>>>> From the log, I still see the pause around 1 minute. >>>>> I enabled the GC logging, please see >>>>> >>>>> http://pastebin.com/m1d4ce0f1 >>>>> >>>>> for details. >>>>> In general I don't see many pauses in the GC. >>>>> >>>>> What is more interesting, 1 hour after the 1st regionserver >>>>> disconnected, >>>>> the master log complained about too many open files. This didn't happen >>>>> before. >>>>> I checked the system OS setup as well as the limits.conf. I also >>>>> checked >>>>> >>>> the >>>> >>>>> running processes. The total open files don't reach the limit. I am >>>>> >>>> confused >>>> >>>>> a bit. >>>>> >>>>> >>>>> Please see the following master log. >>>>> >>>>> 2009-11-13 20:06:02,114 INFO >>>>> org.apache.hadoop.hbase.master.BaseScanner: >>>>> RegionManager.metaScanner scan of 4658 row(s) of meta region {server: >>>>> 192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete >>>>> 2009-11-13 20:06:02,114 INFO >>>>> org.apache.hadoop.hbase.master.BaseScanner: >>>>> >>>> All >>>> >>>>> 1 .META. region(s) scanned >>>>> 2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x424eebf1c10004c after 3ms >>>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.io.IOException: Bad connect ack with >>>>> firstBadLink 192.168.100.123:50010 >>>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_-2808245019291145247_5478039 >>>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.io.EOFException >>>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_1074853606841896259_5478048 >>>>> 2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x24eebf1043003c after 1ms >>>>> 2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x424eebf1c10004c after 0ms >>>>> 2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x24eebf1043003c after 0ms >>>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.io.IOException: Bad connect ack with >>>>> firstBadLink 192.168.100.123:50010 >>>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_-6642544517082142289_5478063 >>>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_2057511041109796090_5478063 >>>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_8219260302213892894_5478064 >>>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_1669205542853067709_5478235 >>>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_9128897691346270351_5478237 >>>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_1116845144864123018_5478240 >>>>> 2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x424eebf1c10004c after 0ms >>>>> 2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >>>>> response for sessionid:0x24eebf1043003c after 0ms >>>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: >>>>> >>>> DataStreamer >>>> >>>>> Exception: java.io.IOException: Too many open files >>>>> at sun.nio.ch.IOUtil.initPipe(Native Method) >>>>> at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) >>>>> at >>>>> >>>>> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) >>>> >>>>> at >>>>> >>>>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) >>>> >>>>> at >>>>> >>>>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) >>>> >>>>> at >>>>> >>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >>>> >>>>> at >>>>> >>>>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) >>>> >>>>> at >>>>> >>>>> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) >>>> >>>>> at >>>>> >>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) >>>> >>>>> at java.io.DataOutputStream.write(DataOutputStream.java:90) >>>>> at >>>>> >>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290) >>>> >>>>> >>>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error >>>>> Recovery for block blk_8148813491785406356_5478475 bad datanode[0] >>>>> 192.168.100.123:50010 >>>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error >>>>> Recovery for block blk_8148813491785406356_5478475 in pipeline >>>>> 192.168.100.123:50010, 192.168.100.134:50010, 192.168.100.122:50010: >>>>> bad >>>>> datanode 192.168.100.123:50010 >>>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Exception >>>>> >>>> in >>>> >>>>> createBlockOutputStream java.net.SocketException: Too many open files >>>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: >>>>> Abandoning >>>>> block blk_2119727700857186236_5478498 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <[email protected] >>>>> wrote: >>>>> >>>>> Will do. >>>>>> >>>>>> thanks >>>>>> zhenyu >>>>>> >>>>>> >>>>>> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote: >>>>>> >>>>>> Enable GC logging too on this next run (see hbase-env.sh). Lets try >>>>>>> >>>>>> and >>>> >>>>> get >>>>>>> to the bottom of whats going on. >>>>>>> Thanks, >>>>>>> St.Ack >>>>>>> >>>>>>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong < >>>>>>> >>>>>> [email protected] >>>> >>>>> wrote: >>>>>>>> >>>>>>> >>>>>>> I can switch the boxes that run zookeeper with the ones that run >>>>>>>> regionservers. >>>>>>>> I will see the results later. >>>>>>>> >>>>>>>> FYI. The node does have the 10 minutes zookeeper.session.timeout >>>>>>>> >>>>>>> value >>>> >>>>> in >>>>>>> >>>>>>>> place. >>>>>>>> >>>>>>>> thanks >>>>>>>> zhenyu >>>>>>>> >>>>>>>> On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote: >>>>>>>> >>>>>>>> On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong < >>>>>>>>> >>>>>>>> [email protected] >>>>>>> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> In my cluster, half of the cluster have 2 disks 400GB each per >>>>>>>>>> >>>>>>>>> machine, >>>>>>> >>>>>>>> and >>>>>>>>> >>>>>>>>>> half of the cluster have 6 disks per machine. Maybe we should >>>>>>>>>> >>>>>>>>> run >>>> >>>>> zookeeper >>>>>>>>>> on the machines with 2 disks and RS on machines with 6 disks? >>>>>>>>>> >>>>>>>>>> That would make most sense only in the below, it looks like the >>>>>>>>>> >>>>>>>>> RS >>>> >>>>> that >>>>>>> >>>>>>>> had >>>>>>>>> issue had 4 disks? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> BTW, the 10 minutes zookeeper.session.timeout has been set during >>>>>>>>>> >>>>>>>>> the >>>>>>> >>>>>>>> experiment. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> And for sure this node had it in place? >>>>>>>>> St.Ack >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> thanks >>>>>>>>>> zhenyu >>>>>>>>>> >>>>>>>>>> On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong < >>>>>>>>>>> >>>>>>>>>> [email protected] >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Though I experienced 2 regionserver disconnection this >>>>>>>>>>>> >>>>>>>>>>> morning, >>>> >>>>> it >>>>>>> >>>>>>>> looks >>>>>>>>>> >>>>>>>>>>> better from the regionserver log. (please see the following >>>>>>>>>>>> >>>>>>>>>>> log) >>>> >>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://pastebin.com/m496dbfae >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I checked diskIO for one of the regionserver(192.168.100.116) >>>>>>>>>>>> >>>>>>>>>>> during >>>>>>>> >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> time it disconnected. >>>>>>>>>>>> >>>>>>>>>>>> Time: 03:04:58 AM >>>>>>>>>>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read >>>>>>>>>>>> >>>>>>>>>>> Blk_wrtn >>>>>>>>> >>>>>>>>>> sda 105.31 5458.83 19503.64 9043873239 >>>>>>>>>>>> >>>>>>>>>>> 32312473676 >>>>>>>>> >>>>>>>>>> sda1 2.90 6.64 99.25 10993934 >>>>>>>>>>>> >>>>>>>>>>> 164433464 >>>>>>>>> >>>>>>>>>> sda2 1.72 23.76 30.25 39365817 >>>>>>>>>>>> >>>>>>>>>>> 50124008 >>>>>>>>> >>>>>>>>>> sda3 0.30 0.38 3.58 624930 >>>>>>>>>>>> >>>>>>>>>>> 5923000 >>>>>>>>> >>>>>>>>>> sda4 100.39 5428.06 19370.56 8992888270 >>>>>>>>>>>> >>>>>>>>>>> 32091993204 >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is this high for you? 20k blocks/second would seem to be high >>>>>>>>>>> >>>>>>>>>> but >>>> >>>>> its >>>>>>>> >>>>>>>>> one >>>>>>>>>> >>>>>>>>>>> disk only and its not being shared by zk anymore so shouldn't >>>>>>>>>>> >>>>>>>>>> matter? >>>>>>> >>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I also checked the zookeeper quorum server that the >>>>>>>>>>>> >>>>>>>>>>> regionserver >>>> >>>>> tried >>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>> connect according to the log. However, I don't see >>>>>>>>>>>> >>>>>>>>>>> 192.168.100.116 >>>>>>> >>>>>>>> in >>>>>>>> >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> client list of the zookeeper quorum server that the >>>>>>>>>>>> >>>>>>>>>>> regionserver >>>> >>>>> tried >>>>>>>>> >>>>>>>>>> to >>>>>>>>>> >>>>>>>>>>> connect. >>>>>>>>>>>> Would that be a problem? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Is that because the ephemeral node for the regionserver had >>>>>>>>>>> >>>>>>>>>> evaporated? >>>>>>>> >>>>>>>>> Lost >>>>>>>>>>> its lease w/ zk by the time you went to look? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thu Nov 12 15:04:52 UTC 2009 >>>>>>>>>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 >>>>>>>>>>>> >>>>>>>>>>> GMT >>>> >>>>> Clients: >>>>>>>>>>>> /192.168.100.127:43045[1](queued=0,recved=26,sent=0) >>>>>>>>>>>> /192.168.100.131:39091[1](queued=0,recved=964,sent=0) >>>>>>>>>>>> /192.168.100.124:35961[1](queued=0,recved=3266,sent=0) >>>>>>>>>>>> /192.168.100.123:47935[1](queued=0,recved=233,sent=0) >>>>>>>>>>>> /192.168.100.125:46931[1](queued=0,recved=2,sent=0) >>>>>>>>>>>> /192.168.100.118:54924[1](queued=0,recved=295,sent=0) >>>>>>>>>>>> /192.168.100.118:41390[1](queued=0,recved=2290,sent=0) >>>>>>>>>>>> /192.168.100.136:42243[1](queued=0,recved=0,sent=0) >>>>>>>>>>>> >>>>>>>>>>>> Latency min/avg/max: 0/17/6333 >>>>>>>>>>>> Received: 47111 >>>>>>>>>>>> Sent: 0 >>>>>>>>>>>> Outstanding: 0 >>>>>>>>>>>> Zxid: 0x77000083f4 >>>>>>>>>>>> Mode: leader >>>>>>>>>>>> Node count: 23 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> That 6 second maximum latency is pretty bad but should be well >>>>>>>>>>> >>>>>>>>>> within >>>>>>> >>>>>>>> the >>>>>>>>> >>>>>>>>>> zk >>>>>>>>>>> session timeout. >>>>>>>>>>> >>>>>>>>>>> So, problem is likely on the zk client side of the session; >>>>>>>>>>> >>>>>>>>>> i.e. >>>> >>>>> in >>>>>>> >>>>>>>> the >>>>>>>> >>>>>>>>> RS. >>>>>>>>>> >>>>>>>>>>> You could enable GC logging as J-D suggested to see if you have >>>>>>>>>>> >>>>>>>>>> any >>>>>>> >>>>>>>> big >>>>>>>> >>>>>>>>> pauses, pauses > zk session timeout. >>>>>>>>>>> >>>>>>>>>>> When the RS went down, it didn't look too heavily loaded: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. 2009-11-12 15:04:52,830 INFO >>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of >>>>>>>>>>> >>>>>>>>>> metrics: >>>>>>>> >>>>>>>>> request=1.5166667, regions=322, stores=657, storefiles=631, >>>>>>>>>>> storefileIndexSize=61, memstoreSize=1472, usedHeap=2819, >>>>>>>>>>> >>>>>>>>>> maxHeap=4079, >>>>>>>>> >>>>>>>>>> blockCacheSize=658110960, blockCacheFree=197395984, >>>>>>>>>>> >>>>>>>>>> blockCacheCount=9903, >>>>>>>>>> >>>>>>>>>>> blockCacheHitRatio=99 >>>>>>>>>>> >>>>>>>>>>> Its serving a few reads? The number of store files seems fine. >>>>>>>>>>> >>>>>>>>>> Not >>>>>>> >>>>>>>> too >>>>>>>>> >>>>>>>>>> much memory used. >>>>>>>>>>> >>>>>>>>>>> Looking at the logs, I see the Lease Still Held exception. >>>>>>>>>>> >>>>>>>>>> This >>>> >>>>> happens >>>>>>>>> >>>>>>>>>> when the RS does its regular report to the master but the >>>>>>>>>>> >>>>>>>>>> master >>>> >>>>> thinks >>>>>>>> >>>>>>>>> the >>>>>>>>>> >>>>>>>>>>> RS has since restarted. It'll think this probably because it >>>>>>>>>>> >>>>>>>>>> noticed >>>>>>> >>>>>>>> that >>>>>>>>>> >>>>>>>>>>> the RS's znode in zk had gone away and it considered the RS >>>>>>>>>>> >>>>>>>>>> dead. >>>> >>>>> >>>>>>>>>>> Looking too at your logs I see this gap in the zk pinging: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. 2009-11-12 15:03:39,325 DEBUG >>>>>>>>>>> >>>>>>>>>> org.apache.zookeeper.ClientCnxn: >>>>>>> >>>>>>>> Got >>>>>>>> >>>>>>>>> ping response for sessionid:0x224e55436ad0004 after 0ms >>>>>>>>>>> 2. 2009-11-12 15:03:43,113 DEBUG >>>>>>>>>>> >>>>>>>>>> org.apache.zookeeper.ClientCnxn: >>>>>>> >>>>>>>> Got >>>>>>>> >>>>>>>>> ping response for sessionid:0x24e55436a0007d after 0ms >>>>>>>>>>> >>>>>>>>>>> Where in the lines above it, its reporting about every ten >>>>>>>>>>> >>>>>>>>>> seconds, >>>>>>> >>>>>>>> here >>>>>>>>> >>>>>>>>>> there is a big gap. >>>>>>>>>>> >>>>>>>>>>> Do you have ganglia or something that will let you look more >>>>>>>>>>> >>>>>>>>>> into >>>> >>>>> what >>>>>>>> >>>>>>>>> was >>>>>>>>>> >>>>>>>>>>> happening on this machine around this time? Is the machine OK? >>>>>>>>>>> >>>>>>>>>> It >>>>>>> >>>>>>>> looks >>>>>>>>> >>>>>>>>>> lightly loaded and you have your cluster nicely laid out. >>>>>>>>>>> >>>>>>>>>> Something >>>>>>> >>>>>>>> odd >>>>>>>>> >>>>>>>>>> is >>>>>>>>>> >>>>>>>>>>> going on. What about things like the write speed to disk? In >>>>>>>>>>> >>>>>>>>>> the >>>> >>>>> past >>>>>>>> >>>>>>>>> strange issues have been explained by incorrectly set BIOS >>>>>>>>>>> >>>>>>>>>> which >>>> >>>>> made >>>>>>> >>>>>>>> disks >>>>>>>>>> >>>>>>>>>>> run at 1/100th of their proper speed. >>>>>>>>>>> >>>>>>>>>>> St.Ack >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> zhenyu >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong < >>>>>>>>>>>> >>>>>>>>>>> [email protected] >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Stack >>>>>>>>>>>>> >>>>>>>>>>>>> I am very appreciated for your comments. >>>>>>>>>>>>> I will use the zookeeper monitoring script on my cluster >>>>>>>>>>>>> >>>>>>>>>>>> and >>>> >>>>> let >>>>>>> >>>>>>>> it >>>>>>>> >>>>>>>>> run >>>>>>>>>> >>>>>>>>>>> overnight to see the result. >>>>>>>>>>>>> I will follow up the thread when I get anything. >>>>>>>>>>>>> >>>>>>>>>>>>> thanks >>>>>>>>>>>>> zhenyu >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]> >>>>>>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>>>>>>> I see these in your log too: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. 2009-11-11 04:27:19,018 DEBUG >>>>>>>>>>>>>> >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>>> Got >>>>>>>>>>> >>>>>>>>>>>> ping response for sessionid:0x424dfd908c50009 after >>>>>>>>>>>>>> >>>>>>>>>>>>> 4544ms >>>> >>>>> 2. 2009-11-11 04:27:19,018 DEBUG >>>>>>>>>>>>>> >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>>> Got >>>>>>>>>>> >>>>>>>>>>>> ping response for sessionid:0x24dfd90c810002 after >>>>>>>>>>>>>> >>>>>>>>>>>>> 4548ms >>>> >>>>> 3. 2009-11-11 04:27:43,960 DEBUG >>>>>>>>>>>>>> >>>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: >>>>>>>>> >>>>>>>>>> Got >>>>>>>>>>> >>>>>>>>>>>> ping response for sessionid:0x424dfd908c50009 after >>>>>>>>>>>>>> >>>>>>>>>>>>> 9030ms >>>> >>>>> 4. 2009-11-11 04:27:43,960 DEBUG >>>>>>>>>>>>>> >>>>>>>>>>>>> org.apache.zookeeper. >>>>>>>>> >>>>>>>> >
