J-D, Thank you very much for your comments. My company block the IRC port, so I have trouble to connect to IRC channel. I have been trying to ask the IT to open the IRC port for me, it might take a while.
Best, zhenyu On Sat, Nov 14, 2009 at 2:21 PM, Jean-Daniel Cryans <[email protected]>wrote: > The error you are getting is a disconnection from a zookeeper server > and is very generic. > > ZK-86 is still opened and the last comment refers to ZK-111 saying > that the bug (in unit tests) was probably fixed in release 3.0.0 last > year. > > To figure the hang you have, you can try to jstack the process pid and > see exactly what's holding the RS from shutting down. > > Would it be possible for you to drop by the IRC channel? This way we > can debug this a at much faster pace. > > Thx! > > J-D > > On Sat, Nov 14, 2009 at 9:57 AM, Zhenyu Zhong <[email protected]> > wrote: > > I found this. > > http://issues.apache.org/jira/browse/ZOOKEEPER-86 > > > > It looks like the same error I had. Is it a zookeeper bug? When will > HBase > > take the zookeeper version 3.3.0? > > > > thanks > > zhenyu > > > > On Sat, Nov 14, 2009 at 11:31 AM, Zhenyu Zhong <[email protected] > >wrote: > > > >> Now I really doubt about the zookeeper, from the log I saw errors like > >> IOException Read Error, while the zookeeper client (regionserver ) tried > to > >> read. But it got disconneted status from zookeeper. > >> > >> I don't see any load on any zookeeper quorum servers. DiskIO is fine. > >> > >> Also when the regionserver decides to exit due to the disconnect status > >> from zookeeper, sometimes the regionserver hangs during the exiting. We > can > >> still see the HRegionServer process even we don't see it in the Master > web > >> interface. > >> > >> I also notice that there is an zookeeper.retries and zookeeper.pauses > >> settings in hbase-default.xml, would that matter ? > >> > >> thanks > >> > >> You may see the IOException error in the following log when the > >> Regionserver lost connection to the zookeeper. > >> > >> 2009-11-14 15:58:31,993 INFO org.apache.hadoop.hbase.regionserver.HLog: > >> Roll > >> > /hbase/.logs/superpyxis0008.scur.colo,60021,1258179101617/hlog.dat.1258179106921, > >> entries=1062964, calcsize=255014936, filesize=139566093. New hlog > >> > /hbase/.logs/superpyxis0008.scur.colo,60021,1258179101617/hlog.dat.1258214311993 > >> 2009-11-14 15:58:39,295 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 15:58:52,648 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 15:59:06,007 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 15:59:19,365 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 15:59:32,867 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 186ms > >> 2009-11-14 15:59:46,070 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 15:59:59,423 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 16:00:12,775 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 0ms > >> 2009-11-14 16:00:28,141 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > >> response for sessionid:0x424f0ed01ea00a6 after 2010ms > >> 2009-11-14 16:01:10,378 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x424f0ed01ea00a6 to > sun.nio.ch.selectionkeyi...@39fbb2d6 > >> > >> java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 > >> lim=4 cap=4] > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > >> 2009-11-14 16:01:10,478 INFO > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > >> state: Disconnected, type: None, path: null > >> 2009-11-14 16:01:11,333 WARN > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 > >> > >> org.apache.hadoop.hbase.Leases$LeaseStillHeldException > >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > >> Method) > >> at > >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > >> at > >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > >> at > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > >> at > >> > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) > >> at > >> > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > >> at > >> > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) > >> at > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) > >> at java.lang.Thread.run(Thread.java:619) > >> 2009-11-14 16:01:11,433 INFO org.apache.zookeeper.ClientCnxn: Attempting > >> connection to server superpyxis0001/192.168.100.115:2181 > >> 2009-11-14 16:01:11,433 INFO org.apache.zookeeper.ClientCnxn: Priming > >> connection to java.nio.channels.SocketChannel[connected local=/ > >> 192.168.100.122:59575 remote=superpyxis0001/192.168.100.115:2181] > >> 2009-11-14 16:01:11,433 INFO org.apache.zookeeper.ClientCnxn: Server > >> connection successful > >> 2009-11-14 16:01:11,433 INFO > >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > >> state: Expired, type: None, path: null > >> 2009-11-14 16:01:11,433 WARN org.apache.zookeeper.ClientCnxn: Exception > >> closing session 0x424f0ed01ea00a6 to > sun.nio.ch.selectionkeyi...@6b93d343 > >> java.io.IOException: Session Expired > >> at > >> > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) > >> > >> at > >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > >> 2009-11-14 16:01:11,433 ERROR > >> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session > >> expired > >> 2009-11-14 16:01:13,513 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > >> handler 3 on 60021, call put([...@73deb204, > >> [Lorg.apache.hadoop.hbase.client.Put;@2179600a) from > 192.168.100.132:40728: > >> error: java.io.IOException: Server not running, aborting > >> java.io.IOException: Server not running, aborting > >> > >> > >> > >> > >> > >> On Sat, Nov 14, 2009 at 1:01 AM, Zhenyu Zhong <[email protected] > >wrote: > >> > >>> Stack, > >>> > >>> Thanks a lot! > >>> > >>> I found out that the reason why HBase doesn't take the system file > >>> descriptor value I set before. I started the HBase using root instead > of the > >>> normal hadoop user, while my system configures hadoop with higher file > >>> descriptor value but configures root with 1024 default value. > >>> > >>> > >>> Now my system has a clean start. However, it seems that the > >>> zookeeper.session.timeout value doesn't take into effect. I still found > >>> around 60 seconds pauses from the disconnected regionserver. I really > don't > >>> know why regionserver only times out after 60 seconds instead of 10 > minutes > >>> which I set for zookeeper.session.timeout. > >>> > >>> Is there any other timeout value coming into play before the actual > >>> session times out? > >>> > >>> zhenyu > >>> > >>> > >>> > >>> On Fri, Nov 13, 2009 at 7:08 PM, stack <[email protected]> wrote: > >>> > >>>> Ok. Lack of file descriptors manifests in all kinds of weird ways. > >>>> Hopefully thats it. If not, lets keep digging. > >>>> St.Ack > >>>> > >>>> On Fri, Nov 13, 2009 at 3:44 PM, Zhenyu Zhong < > [email protected] > >>>> >wrote: > >>>> > >>>> > Stack, > >>>> > > >>>> > You are right, the master started with ulimit -n 1024. It doesn't > take > >>>> the > >>>> > system value. > >>>> > > >>>> > Regarding to the too many open files, it looks like the same as the > one > >>>> J-D > >>>> > put up. But I will get the Master start with higher value first and > see > >>>> if > >>>> > this kind of error goes away. > >>>> > > >>>> > thanks a lot! > >>>> > zhenyu > >>>> > > >>>> > On Fri, Nov 13, 2009 at 6:02 PM, stack <[email protected]> wrote: > >>>> > > >>>> > > Does it say > >>>> > > > >>>> > > ulimit -n 32768 > >>>> > > > >>>> > > ...as the second line in your log file on start of the master? > >>>> > > > >>>> > > Is the stack trace that complains about too many open files same > as > >>>> the > >>>> > one > >>>> > > in the blog post J-D put up? > >>>> > > > >>>> > > St.Ack > >>>> > > > >>>> > > > >>>> > > On Fri, Nov 13, 2009 at 1:37 PM, Zhenyu Zhong < > >>>> [email protected] > >>>> > > >wrote: > >>>> > > > >>>> > > > The ulimit file descriptors was set to fs.file-max =1578334, > also > >>>> in > >>>> > the > >>>> > > > limits.conf the value was set to 32768. > >>>> > > > So these are way higher than the open descriptors for the > running > >>>> > > > processes. > >>>> > > > > >>>> > > > thanks > >>>> > > > zhenyu > >>>> > > > > >>>> > > > > >>>> > > > On Fri, Nov 13, 2009 at 4:33 PM, Stack <[email protected]> > >>>> wrote: > >>>> > > > > >>>> > > > > You upped the ulimit file descriptors as per the getting > started > >>>> doc? > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > On Nov 13, 2009, at 1:26 PM, Zhenyu Zhong < > >>>> [email protected]> > >>>> > > > wrote: > >>>> > > > > > >>>> > > > > Thanks a lot. > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> Bad news is my kernel is still 2.6.26. > >>>> > > > >> But it was not a problem before. > >>>> > > > >> > >>>> > > > >> Very strange. > >>>> > > > >> > >>>> > > > >> zhenyu > >>>> > > > >> > >>>> > > > >> On Fri, Nov 13, 2009 at 4:16 PM, Jean-Daniel Cryans < > >>>> > > > [email protected] > >>>> > > > >> >wrote: > >>>> > > > >> > >>>> > > > >> Looks like > >>>> > > > >>> > >>>> > > > >>> > >>>> > > > > >>>> > > > >>>> > > >>>> > http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/ > >>>> > > > >>> > >>>> > > > >>> J-D > >>>> > > > >>> > >>>> > > > >>> On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong < > >>>> > > [email protected] > >>>> > > > > > >>>> > > > >>> wrote: > >>>> > > > >>> > >>>> > > > >>>> Hi, > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> After I re-organize the cluster, the experiment ran into > >>>> problem > >>>> > > > faster > >>>> > > > >>>> > >>>> > > > >>> than > >>>> > > > >>> > >>>> > > > >>>> before. > >>>> > > > >>>> > >>>> > > > >>>> Basically the changes are to use machines with less > resources > >>>> as > >>>> > > > >>>> > >>>> > > > >>> zookeeper > >>>> > > > >>> > >>>> > > > >>>> quorums and machines with more resources as regionserver. > >>>> > > > >>>> > >>>> > > > >>>> From the log, I still see the pause around 1 minute. > >>>> > > > >>>> I enabled the GC logging, please see > >>>> > > > >>>> > >>>> > > > >>>> http://pastebin.com/m1d4ce0f1 > >>>> > > > >>>> > >>>> > > > >>>> for details. > >>>> > > > >>>> In general I don't see many pauses in the GC. > >>>> > > > >>>> > >>>> > > > >>>> What is more interesting, 1 hour after the 1st regionserver > >>>> > > > >>>> disconnected, > >>>> > > > >>>> the master log complained about too many open files. This > >>>> didn't > >>>> > > > happen > >>>> > > > >>>> before. > >>>> > > > >>>> I checked the system OS setup as well as the limits.conf. I > >>>> also > >>>> > > > checked > >>>> > > > >>>> > >>>> > > > >>> the > >>>> > > > >>> > >>>> > > > >>>> running processes. The total open files don't reach the > limit. > >>>> I > >>>> > am > >>>> > > > >>>> > >>>> > > > >>> confused > >>>> > > > >>> > >>>> > > > >>>> a bit. > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> Please see the following master log. > >>>> > > > >>>> > >>>> > > > >>>> 2009-11-13 20:06:02,114 INFO > >>>> > > > org.apache.hadoop.hbase.master.BaseScanner: > >>>> > > > >>>> RegionManager.metaScanner scan of 4658 row(s) of meta > region > >>>> > > {server: > >>>> > > > >>>> 192.168.100.128:60021, regionname: .META.,,1, startKey: > <>} > >>>> > > complete > >>>> > > > >>>> 2009-11-13 20:06:02,114 INFO > >>>> > > > org.apache.hadoop.hbase.master.BaseScanner: > >>>> > > > >>>> > >>>> > > > >>> All > >>>> > > > >>> > >>>> > > > >>>> 1 .META. region(s) scanned > >>>> > > > >>>> 2009-11-13 20:06:07,677 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x424eebf1c10004c after 3ms > >>>> > > > >>>> 2009-11-13 20:06:08,178 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.io.IOException: Bad connect > ack > >>>> with > >>>> > > > >>>> firstBadLink 192.168.100.123:50010 > >>>> > > > >>>> 2009-11-13 20:06:08,178 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_-2808245019291145247_5478039 > >>>> > > > >>>> 2009-11-13 20:06:09,682 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.io.EOFException > >>>> > > > >>>> 2009-11-13 20:06:09,682 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_1074853606841896259_5478048 > >>>> > > > >>>> 2009-11-13 20:06:10,334 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x24eebf1043003c after 1ms > >>>> > > > >>>> 2009-11-13 20:06:21,018 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x424eebf1c10004c after 0ms > >>>> > > > >>>> 2009-11-13 20:06:23,674 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x24eebf1043003c after 0ms > >>>> > > > >>>> 2009-11-13 20:06:24,828 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.io.IOException: Bad connect > ack > >>>> with > >>>> > > > >>>> firstBadLink 192.168.100.123:50010 > >>>> > > > >>>> 2009-11-13 20:06:24,828 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_-6642544517082142289_5478063 > >>>> > > > >>>> 2009-11-13 20:06:24,828 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:24,828 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_2057511041109796090_5478063 > >>>> > > > >>>> 2009-11-13 20:06:24,928 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:24,928 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_8219260302213892894_5478064 > >>>> > > > >>>> 2009-11-13 20:06:30,855 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:30,855 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_1669205542853067709_5478235 > >>>> > > > >>>> 2009-11-13 20:06:30,905 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:30,905 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_9128897691346270351_5478237 > >>>> > > > >>>> 2009-11-13 20:06:30,955 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:30,955 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_1116845144864123018_5478240 > >>>> > > > >>>> 2009-11-13 20:06:34,372 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x424eebf1c10004c after 0ms > >>>> > > > >>>> 2009-11-13 20:06:37,034 DEBUG > org.apache.zookeeper.ClientCnxn: > >>>> Got > >>>> > > > ping > >>>> > > > >>>> response for sessionid:0x24eebf1043003c after 0ms > >>>> > > > >>>> 2009-11-13 20:06:37,235 WARN > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> > >>>> > > > >>> DataStreamer > >>>> > > > >>> > >>>> > > > >>>> Exception: java.io.IOException: Too many open files > >>>> > > > >>>> at sun.nio.ch.IOUtil.initPipe(Native Method) > >>>> > > > >>>> at > >>>> > > sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>> > >>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > >>>> > > > >>> > >>>> > > > >>>> at > >>>> java.io.DataOutputStream.write(DataOutputStream.java:90) > >>>> > > > >>>> at > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290) > >>>> > > > >>> > >>>> > > > >>>> > >>>> > > > >>>> 2009-11-13 20:06:37,235 WARN > org.apache.hadoop.hdfs.DFSClient: > >>>> > Error > >>>> > > > >>>> Recovery for block blk_8148813491785406356_5478475 bad > >>>> datanode[0] > >>>> > > > >>>> 192.168.100.123:50010 > >>>> > > > >>>> 2009-11-13 20:06:37,235 WARN > org.apache.hadoop.hdfs.DFSClient: > >>>> > Error > >>>> > > > >>>> Recovery for block blk_8148813491785406356_5478475 in > pipeline > >>>> > > > >>>> 192.168.100.123:50010, 192.168.100.134:50010, > >>>> > 192.168.100.122:50010 > >>>> > > : > >>>> > > > >>>> bad > >>>> > > > >>>> datanode 192.168.100.123:50010 > >>>> > > > >>>> 2009-11-13 20:06:37,436 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > Exception > >>>> > > > >>>> > >>>> > > > >>> in > >>>> > > > >>> > >>>> > > > >>>> createBlockOutputStream java.net.SocketException: Too many > >>>> open > >>>> > > files > >>>> > > > >>>> 2009-11-13 20:06:37,436 INFO > org.apache.hadoop.hdfs.DFSClient: > >>>> > > > >>>> Abandoning > >>>> > > > >>>> block blk_2119727700857186236_5478498 > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> > >>>> > > > >>>> On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong < > >>>> > > > [email protected] > >>>> > > > >>>> wrote: > >>>> > > > >>>> > >>>> > > > >>>> Will do. > >>>> > > > >>>>> > >>>> > > > >>>>> thanks > >>>> > > > >>>>> zhenyu > >>>> > > > >>>>> > >>>> > > > >>>>> > >>>> > > > >>>>> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> > >>>> wrote: > >>>> > > > >>>>> > >>>> > > > >>>>> Enable GC logging too on this next run (see > hbase-env.sh). > >>>> Lets > >>>> > > try > >>>> > > > >>>>>> > >>>> > > > >>>>> and > >>>> > > > >>> > >>>> > > > >>>> get > >>>> > > > >>>>>> to the bottom of whats going on. > >>>> > > > >>>>>> Thanks, > >>>> > > > >>>>>> St.Ack > >>>> > > > >>>>>> > >>>> > > > >>>>>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong < > >>>> > > > >>>>>> > >>>> > > > >>>>> [email protected] > >>>> > > > >>> > >>>> > > > >>>> wrote: > >>>> > > > >>>>>>> > >>>> > > > >>>>>> > >>>> > > > >>>>>> I can switch the boxes that run zookeeper with the ones > >>>> that > >>>> > run > >>>> > > > >>>>>>> regionservers. > >>>> > > > >>>>>>> I will see the results later. > >>>> > > > >>>>>>> > >>>> > > > >>>>>>> FYI. The node does have the 10 minutes > >>>> > zookeeper.session.timeout > >>>> > > > >>>>>>> > >>>> > > > >>>>>> value > >>>> > > > >>> > >>>> > > > >>>> in > >>>> > > > >>>>>> > >>>> > > > >>>>>>> place. > >>>> > > > >>>>>>> > >>>> > > > >>>>>>> thanks > >>>> > > > >>>>>>> zhenyu > >>>> > > > >>>>>>> > >>>> > > > >>>>>>> On Thu, Nov 12, 2009 at 3:21 PM, stack < > [email protected]> > >>>> > wrote: > >>>> > > > >>>>>>> > >>>> > > > >>>>>>> On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong < > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>> [email protected] > >>>> > > > >>>>>> > >>>> > > > >>>>>>> wrote: > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> In my cluster, half of the cluster have 2 disks 400GB > >>>> each > >>>> > per > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> machine, > >>>> > > > >>>>>> > >>>> > > > >>>>>>> and > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> half of the cluster have 6 disks per machine. Maybe > we > >>>> > should > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> run > >>>> > > > >>> > >>>> > > > >>>> zookeeper > >>>> > > > >>>>>>>>> on the machines with 2 disks and RS on machines with 6 > >>>> disks? > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>> That would make most sense only in the below, it looks > >>>> like > >>>> > the > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> RS > >>>> > > > >>> > >>>> > > > >>>> that > >>>> > > > >>>>>> > >>>> > > > >>>>>>> had > >>>> > > > >>>>>>>> issue had 4 disks? > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> BTW, the 10 minutes zookeeper.session.timeout has been > >>>> set > >>>> > > during > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> the > >>>> > > > >>>>>> > >>>> > > > >>>>>>> experiment. > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>> And for sure this node had it in place? > >>>> > > > >>>>>>>> St.Ack > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> thanks > >>>> > > > >>>>>>>>> zhenyu > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>> On Thu, Nov 12, 2009 at 2:08 PM, stack < > [email protected] > >>>> > > >>>> > > wrote: > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>> On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong < > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> [email protected] > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> wrote: > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Though I experienced 2 regionserver disconnection > this > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> morning, > >>>> > > > >>> > >>>> > > > >>>> it > >>>> > > > >>>>>> > >>>> > > > >>>>>>> looks > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> better from the regionserver log. (please see the > >>>> following > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> log) > >>>> > > > >>> > >>>> > > > >>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> http://pastebin.com/m496dbfae > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> I checked diskIO for one of the > >>>> > regionserver(192.168.100.116) > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> during > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> the > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> time it disconnected. > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> Time: 03:04:58 AM > >>>> > > > >>>>>>>>>>> Device: tps Blk_read/s Blk_wrtn/s > >>>> Blk_read > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> Blk_wrtn > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> sda 105.31 5458.83 19503.64 > >>>> 9043873239 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 32312473676 > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> sda1 2.90 6.64 99.25 > >>>> 10993934 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 164433464 > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> sda2 1.72 23.76 30.25 > >>>> 39365817 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 50124008 > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> sda3 0.30 0.38 3.58 > >>>> 624930 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 5923000 > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> sda4 100.39 5428.06 19370.56 > >>>> 8992888270 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 32091993204 > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> Is this high for you? 20k blocks/second would seem > to > >>>> be > >>>> > > high > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> but > >>>> > > > >>> > >>>> > > > >>>> its > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> one > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> disk only and its not being shared by zk anymore so > >>>> > shouldn't > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> matter? > >>>> > > > >>>>>> > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> I also checked the zookeeper quorum server that the > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> regionserver > >>>> > > > >>> > >>>> > > > >>>> tried > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> to > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> connect according to the log. However, I don't see > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> 192.168.100.116 > >>>> > > > >>>>>> > >>>> > > > >>>>>>> in > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> the > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> client list of the zookeeper quorum server that the > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> regionserver > >>>> > > > >>> > >>>> > > > >>>> tried > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> to > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> connect. > >>>> > > > >>>>>>>>>>> Would that be a problem? > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> Is that because the ephemeral node for the > >>>> regionserver > >>>> > had > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> evaporated? > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> Lost > >>>> > > > >>>>>>>>>> its lease w/ zk by the time you went to look? > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Thu Nov 12 15:04:52 UTC 2009 > >>>> > > > >>>>>>>>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 > >>>> 18:48 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> GMT > >>>> > > > >>> > >>>> > > > >>>> Clients: > >>>> > > > >>>>>>>>>>> /192.168.100.127:43045 > [1](queued=0,recved=26,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.131:39091 > [1](queued=0,recved=964,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.124:35961 > [1](queued=0,recved=3266,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.123:47935 > [1](queued=0,recved=233,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.125:46931[1](queued=0,recved=2,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.118:54924 > [1](queued=0,recved=295,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.118:41390 > [1](queued=0,recved=2290,sent=0) > >>>> > > > >>>>>>>>>>> /192.168.100.136:42243[1](queued=0,recved=0,sent=0) > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> Latency min/avg/max: 0/17/6333 > >>>> > > > >>>>>>>>>>> Received: 47111 > >>>> > > > >>>>>>>>>>> Sent: 0 > >>>> > > > >>>>>>>>>>> Outstanding: 0 > >>>> > > > >>>>>>>>>>> Zxid: 0x77000083f4 > >>>> > > > >>>>>>>>>>> Mode: leader > >>>> > > > >>>>>>>>>>> Node count: 23 > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> That 6 second maximum latency is pretty bad but > should > >>>> be > >>>> > well > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> within > >>>> > > > >>>>>> > >>>> > > > >>>>>>> the > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> zk > >>>> > > > >>>>>>>>>> session timeout. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> So, problem is likely on the zk client side of the > >>>> session; > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> i.e. > >>>> > > > >>> > >>>> > > > >>>> in > >>>> > > > >>>>>> > >>>> > > > >>>>>>> the > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> RS. > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> You could enable GC logging as J-D suggested to see > if > >>>> you > >>>> > > have > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> any > >>>> > > > >>>>>> > >>>> > > > >>>>>>> big > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> pauses, pauses > zk session timeout. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> When the RS went down, it didn't look too heavily > >>>> loaded: > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> 1. 2009-11-12 15:04:52,830 INFO > >>>> > > > >>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: > >>>> Dump of > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> metrics: > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> request=1.5166667, regions=322, stores=657, > >>>> storefiles=631, > >>>> > > > >>>>>>>>>> storefileIndexSize=61, memstoreSize=1472, > >>>> usedHeap=2819, > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> maxHeap=4079, > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> blockCacheSize=658110960, blockCacheFree=197395984, > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> blockCacheCount=9903, > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> blockCacheHitRatio=99 > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Its serving a few reads? The number of store files > >>>> seems > >>>> > > fine. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> Not > >>>> > > > >>>>>> > >>>> > > > >>>>>>> too > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> much memory used. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Looking at the logs, I see the Lease Still Held > >>>> exception. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> This > >>>> > > > >>> > >>>> > > > >>>> happens > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> when the RS does its regular report to the master but > the > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> master > >>>> > > > >>> > >>>> > > > >>>> thinks > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> the > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> RS has since restarted. It'll think this probably > >>>> because > >>>> > it > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> noticed > >>>> > > > >>>>>> > >>>> > > > >>>>>>> that > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> the RS's znode in zk had gone away and it considered > the > >>>> RS > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> dead. > >>>> > > > >>> > >>>> > > > >>>> > >>>> > > > >>>>>>>>>> Looking too at your logs I see this gap in the zk > >>>> pinging: > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> 1. 2009-11-12 15:03:39,325 DEBUG > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> org.apache.zookeeper.ClientCnxn: > >>>> > > > >>>>>> > >>>> > > > >>>>>>> Got > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> ping response for sessionid:0x224e55436ad0004 after > 0ms > >>>> > > > >>>>>>>>>> 2. 2009-11-12 15:03:43,113 DEBUG > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> org.apache.zookeeper.ClientCnxn: > >>>> > > > >>>>>> > >>>> > > > >>>>>>> Got > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> ping response for sessionid:0x24e55436a0007d after 0ms > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Where in the lines above it, its reporting about > every > >>>> ten > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> seconds, > >>>> > > > >>>>>> > >>>> > > > >>>>>>> here > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> there is a big gap. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> Do you have ganglia or something that will let you > look > >>>> more > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> into > >>>> > > > >>> > >>>> > > > >>>> what > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> was > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> happening on this machine around this time? Is the > >>>> machine > >>>> > > OK? > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> It > >>>> > > > >>>>>> > >>>> > > > >>>>>>> looks > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> lightly loaded and you have your cluster nicely laid > out. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> Something > >>>> > > > >>>>>> > >>>> > > > >>>>>>> odd > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> is > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> going on. What about things like the write speed to > >>>> disk? > >>>> > In > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> the > >>>> > > > >>> > >>>> > > > >>>> past > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> strange issues have been explained by incorrectly set > BIOS > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>> which > >>>> > > > >>> > >>>> > > > >>>> made > >>>> > > > >>>>>> > >>>> > > > >>>>>>> disks > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> run at 1/100th of their proper speed. > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> St.Ack > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>>> Best, > >>>> > > > >>>>>>>>>>> zhenyu > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong < > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>> [email protected] > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> wrote: > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>> > >>>> > > > >>>>>>>>>>> Stack > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> I am very appreciated for your comments. > >>>> > > > >>>>>>>>>>>> I will use the zookeeper monitoring script on my > >>>> cluster > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>> and > >>>> > > > >>> > >>>> > > > >>>> let > >>>> > > > >>>>>> > >>>> > > > >>>>>>> it > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>> run > >>>> > > > >>>>>>>>> > >>>> > > > >>>>>>>>>> overnight to see the result. > >>>> > > > >>>>>>>>>>>> I will follow up the thread when I get anything. > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> thanks > >>>> > > > >>>>>>>>>>>> zhenyu > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> On Wed, Nov 11, 2009 at 3:52 PM, stack < > >>>> [email protected]> > >>>> > > > >>>>>>>>>>>> > >>>> > > > >>>>>>>>>>> wrote: > >>>> > > > >>>>>> > >>>> > > > >>>>>>> > >>>> > > > >>>>>>>>>>>> I see these in your log too: > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>>> 1. 2009-11-11 04:27:19,018 DEBUG > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> Got > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>>> ping response for sessionid:0x424dfd908c50009 after > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> 4544ms > >>>> > > > >>> > >>>> > > > >>>> 2. 2009-11-11 04:27:19,018 DEBUG > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> Got > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>>> ping response for sessionid:0x24dfd90c810002 after > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> 4548ms > >>>> > > > >>> > >>>> > > > >>>> 3. 2009-11-11 04:27:43,960 DEBUG > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn: > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>>>> Got > >>>> > > > >>>>>>>>>> > >>>> > > > >>>>>>>>>>> ping response for sessionid:0x424dfd908c50009 after > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> 9030ms > >>>> > > > >>> > >>>> > > > >>>> 4. 2009-11-11 04:27:43,960 DEBUG > >>>> > > > >>>>>>>>>>>>> > >>>> > > > >>>>>>>>>>>> org.apache.zookeeper. > >>>> > > > >>>>>>>> > >>>> > > > >>>>>>> > >>>> > > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > >
