Looks like http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
J-D On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <[email protected]> wrote: > Hi, > > > After I re-organize the cluster, the experiment ran into problem faster than > before. > > Basically the changes are to use machines with less resources as zookeeper > quorums and machines with more resources as regionserver. > > From the log, I still see the pause around 1 minute. > I enabled the GC logging, please see > > http://pastebin.com/m1d4ce0f1 > > for details. > In general I don't see many pauses in the GC. > > What is more interesting, 1 hour after the 1st regionserver disconnected, > the master log complained about too many open files. This didn't happen > before. > I checked the system OS setup as well as the limits.conf. I also checked the > running processes. The total open files don't reach the limit. I am confused > a bit. > > > Please see the following master log. > > 2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner: > RegionManager.metaScanner scan of 4658 row(s) of meta region {server: > 192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete > 2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner: All > 1 .META. region(s) scanned > 2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x424eebf1c10004c after 3ms > 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 192.168.100.123:50010 > 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-2808245019291145247_5478039 > 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.EOFException > 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_1074853606841896259_5478048 > 2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x24eebf1043003c after 1ms > 2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x424eebf1c10004c after 0ms > 2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x24eebf1043003c after 0ms > 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 192.168.100.123:50010 > 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_-6642544517082142289_5478063 > 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_2057511041109796090_5478063 > 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_8219260302213892894_5478064 > 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_1669205542853067709_5478235 > 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_9128897691346270351_5478237 > 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_1116845144864123018_5478240 > 2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x424eebf1c10004c after 0ms > 2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x24eebf1043003c after 0ms > 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer > Exception: java.io.IOException: Too many open files > at sun.nio.ch.IOUtil.initPipe(Native Method) > at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49) > at > sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) > at java.io.DataOutputStream.write(DataOutputStream.java:90) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290) > > 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_8148813491785406356_5478475 bad datanode[0] > 192.168.100.123:50010 > 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error > Recovery for block blk_8148813491785406356_5478475 in pipeline > 192.168.100.123:50010, 192.168.100.134:50010, 192.168.100.122:50010: bad > datanode 192.168.100.123:50010 > 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Exception in > createBlockOutputStream java.net.SocketException: Too many open files > 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > block blk_2119727700857186236_5478498 > > > > > > On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <[email protected]>wrote: > >> Will do. >> >> thanks >> zhenyu >> >> >> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote: >> >>> Enable GC logging too on this next run (see hbase-env.sh). Lets try and >>> get >>> to the bottom of whats going on. >>> Thanks, >>> St.Ack >>> >>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <[email protected] >>> >wrote: >>> >>> > I can switch the boxes that run zookeeper with the ones that run >>> > regionservers. >>> > I will see the results later. >>> > >>> > FYI. The node does have the 10 minutes zookeeper.session.timeout value >>> in >>> > place. >>> > >>> > thanks >>> > zhenyu >>> > >>> > On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote: >>> > >>> > > On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong < >>> [email protected] >>> > > >wrote: >>> > > >>> > > > >>> > > > In my cluster, half of the cluster have 2 disks 400GB each per >>> machine, >>> > > and >>> > > > half of the cluster have 6 disks per machine. Maybe we should run >>> > > > zookeeper >>> > > > on the machines with 2 disks and RS on machines with 6 disks? >>> > > > >>> > > > That would make most sense only in the below, it looks like the RS >>> that >>> > > had >>> > > issue had 4 disks? >>> > > >>> > > >>> > > >>> > > > BTW, the 10 minutes zookeeper.session.timeout has been set during >>> the >>> > > > experiment. >>> > > > >>> > > >>> > > And for sure this node had it in place? >>> > > St.Ack >>> > > >>> > > >>> > > >>> > > > >>> > > > thanks >>> > > > zhenyu >>> > > > >>> > > > On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]> wrote: >>> > > > >>> > > > > On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong < >>> > [email protected] >>> > > > > >wrote: >>> > > > > >>> > > > > >>> > > > > > Though I experienced 2 regionserver disconnection this morning, >>> it >>> > > > looks >>> > > > > > better from the regionserver log. (please see the following log) >>> > > > > > >>> > > > > > >>> > > > > > http://pastebin.com/m496dbfae >>> > > > > > >>> > > > > > >>> > > > > > I checked diskIO for one of the regionserver(192.168.100.116) >>> > during >>> > > > the >>> > > > > > time it disconnected. >>> > > > > > >>> > > > > > Time: 03:04:58 AM >>> > > > > > Device: tps Blk_read/s Blk_wrtn/s Blk_read >>> > > Blk_wrtn >>> > > > > > sda 105.31 5458.83 19503.64 9043873239 >>> > > 32312473676 >>> > > > > > sda1 2.90 6.64 99.25 10993934 >>> > > 164433464 >>> > > > > > sda2 1.72 23.76 30.25 39365817 >>> > > 50124008 >>> > > > > > sda3 0.30 0.38 3.58 624930 >>> > > 5923000 >>> > > > > > sda4 100.39 5428.06 19370.56 8992888270 >>> > > 32091993204 >>> > > > > > >>> > > > > > >>> > > > > Is this high for you? 20k blocks/second would seem to be high but >>> > its >>> > > > one >>> > > > > disk only and its not being shared by zk anymore so shouldn't >>> matter? >>> > > > > >>> > > > > >>> > > > > > I also checked the zookeeper quorum server that the regionserver >>> > > tried >>> > > > to >>> > > > > > connect according to the log. However, I don't see >>> 192.168.100.116 >>> > in >>> > > > the >>> > > > > > client list of the zookeeper quorum server that the regionserver >>> > > tried >>> > > > to >>> > > > > > connect. >>> > > > > > Would that be a problem? >>> > > > > > >>> > > > > > >>> > > > > Is that because the ephemeral node for the regionserver had >>> > evaporated? >>> > > > > Lost >>> > > > > its lease w/ zk by the time you went to look? >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > > Thu Nov 12 15:04:52 UTC 2009 >>> > > > > > Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT >>> > > > > > Clients: >>> > > > > > /192.168.100.127:43045[1](queued=0,recved=26,sent=0) >>> > > > > > /192.168.100.131:39091[1](queued=0,recved=964,sent=0) >>> > > > > > /192.168.100.124:35961[1](queued=0,recved=3266,sent=0) >>> > > > > > /192.168.100.123:47935[1](queued=0,recved=233,sent=0) >>> > > > > > /192.168.100.125:46931[1](queued=0,recved=2,sent=0) >>> > > > > > /192.168.100.118:54924[1](queued=0,recved=295,sent=0) >>> > > > > > /192.168.100.118:41390[1](queued=0,recved=2290,sent=0) >>> > > > > > /192.168.100.136:42243[1](queued=0,recved=0,sent=0) >>> > > > > > >>> > > > > > Latency min/avg/max: 0/17/6333 >>> > > > > > Received: 47111 >>> > > > > > Sent: 0 >>> > > > > > Outstanding: 0 >>> > > > > > Zxid: 0x77000083f4 >>> > > > > > Mode: leader >>> > > > > > Node count: 23 >>> > > > > > >>> > > > > >>> > > > > >>> > > > > That 6 second maximum latency is pretty bad but should be well >>> within >>> > > the >>> > > > > zk >>> > > > > session timeout. >>> > > > > >>> > > > > So, problem is likely on the zk client side of the session; i.e. >>> in >>> > the >>> > > > RS. >>> > > > > You could enable GC logging as J-D suggested to see if you have >>> any >>> > big >>> > > > > pauses, pauses > zk session timeout. >>> > > > > >>> > > > > When the RS went down, it didn't look too heavily loaded: >>> > > > > >>> > > > > >>> > > > > 1. 2009-11-12 15:04:52,830 INFO >>> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of >>> > metrics: >>> > > > > request=1.5166667, regions=322, stores=657, storefiles=631, >>> > > > > storefileIndexSize=61, memstoreSize=1472, usedHeap=2819, >>> > > maxHeap=4079, >>> > > > > blockCacheSize=658110960, blockCacheFree=197395984, >>> > > > blockCacheCount=9903, >>> > > > > blockCacheHitRatio=99 >>> > > > > >>> > > > > Its serving a few reads? The number of store files seems fine. >>> Not >>> > > too >>> > > > > much memory used. >>> > > > > >>> > > > > Looking at the logs, I see the Lease Still Held exception. This >>> > > happens >>> > > > > when the RS does its regular report to the master but the master >>> > thinks >>> > > > the >>> > > > > RS has since restarted. It'll think this probably because it >>> noticed >>> > > > that >>> > > > > the RS's znode in zk had gone away and it considered the RS dead. >>> > > > > >>> > > > > Looking too at your logs I see this gap in the zk pinging: >>> > > > > >>> > > > > >>> > > > > 1. 2009-11-12 15:03:39,325 DEBUG >>> org.apache.zookeeper.ClientCnxn: >>> > Got >>> > > > > ping response for sessionid:0x224e55436ad0004 after 0ms >>> > > > > 2. 2009-11-12 15:03:43,113 DEBUG >>> org.apache.zookeeper.ClientCnxn: >>> > Got >>> > > > > ping response for sessionid:0x24e55436a0007d after 0ms >>> > > > > >>> > > > > Where in the lines above it, its reporting about every ten >>> seconds, >>> > > here >>> > > > > there is a big gap. >>> > > > > >>> > > > > Do you have ganglia or something that will let you look more into >>> > what >>> > > > was >>> > > > > happening on this machine around this time? Is the machine OK? >>> It >>> > > looks >>> > > > > lightly loaded and you have your cluster nicely laid out. >>> Something >>> > > odd >>> > > > is >>> > > > > going on. What about things like the write speed to disk? In the >>> > past >>> > > > > strange issues have been explained by incorrectly set BIOS which >>> made >>> > > > disks >>> > > > > run at 1/100th of their proper speed. >>> > > > > >>> > > > > St.Ack >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > > >>> > > > > > Best, >>> > > > > > zhenyu >>> > > > > > >>> > > > > > >>> > > > > > On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong < >>> > > [email protected] >>> > > > > > >wrote: >>> > > > > > >>> > > > > > > Stack >>> > > > > > > >>> > > > > > > I am very appreciated for your comments. >>> > > > > > > I will use the zookeeper monitoring script on my cluster and >>> let >>> > it >>> > > > run >>> > > > > > > overnight to see the result. >>> > > > > > > I will follow up the thread when I get anything. >>> > > > > > > >>> > > > > > > thanks >>> > > > > > > zhenyu >>> > > > > > > >>> > > > > > > >>> > > > > > > On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]> >>> wrote: >>> > > > > > > >>> > > > > > >> I see these in your log too: >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> 1. 2009-11-11 04:27:19,018 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x424dfd908c50009 after 4544ms >>> > > > > > >> 2. 2009-11-11 04:27:19,018 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x24dfd90c810002 after 4548ms >>> > > > > > >> 3. 2009-11-11 04:27:43,960 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x424dfd908c50009 after 9030ms >>> > > > > > >> 4. 2009-11-11 04:27:43,960 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x24dfd90c810002 after 9030ms >>> > > > > > >> 5. 2009-11-11 04:27:48,830 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x424dfd908c50009 after 0ms >>> > > > > > >> 6. 2009-11-11 04:27:48,830 DEBUG >>> > > org.apache.zookeeper.ClientCnxn: >>> > > > > Got >>> > > > > > >> ping response for sessionid:0x24dfd90c810002 after 0ms >>> > > > > > >> >>> > > > > > >> Which would indicate lag talking to zk too. >>> > > > > > >> >>> > > > > > >> Appending HLog also seems to get steadily worse. >>> > > > > > >> >>> > > > > > >> I also see these: >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> 1. 2009-11-11 04:27:01,233 WARN >>> > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> Exception closing session 0x24dfd90c810002 to >>> > > > > > >> sun.nio.ch.selectionkeyi...@5e16a7e3 >>> > > > > > >> 2. java.io.IOException: Read error rc = -1 >>> > > > > > >> java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] >>> > > > > > >> 3. at >>> > > > > > >> >>> > > > > >>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) >>> > > > > > >> 4. at >>> > > > > > >> >>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> Were all 5 quorum members participating? Check by doing >>> "echo >>> > > > stat|nc >>> > > > > > >> QUORUM_HOST port" for each member of quorum.... the below >>> might >>> > > > help: >>> > > > > > >> >>> > > > > > >> members="1,2,3,4,5" >>> > > > > > >> while true; do date; for i in $members; do echo $i; echo stat >>> | >>> > nc >>> > > > $i >>> > > > > > >> 2181; >>> > > > > > >> done; sleep 10; done >>> > > > > > >> >>> > > > > > >> We need to improve our zk_dump. It just reports quorum >>> > members... >>> > > > not >>> > > > > > if >>> > > > > > >> they are actually participating or not. The above will tell >>> you >>> > > if >>> > > > > dead >>> > > > > > >> or >>> > > > > > >> alive. >>> > > > > > >> >>> > > > > > >> Please report back after checking above and following the >>> > > > Jean-Daniel >>> > > > > > >> prescription. >>> > > > > > >> >>> > > > > > >> Thanks, >>> > > > > > >> St.Ack >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> On Wed, Nov 11, 2009 at 12:29 PM, Zhenyu Zhong < >>> > > > > [email protected] >>> > > > > > >> >wrote: >>> > > > > > >> >>> > > > > > >> > Thanks a lot for the help. >>> > > > > > >> > >>> > > > > > >> > I understand that the change of disk configurations should >>> be >>> > > done >>> > > > > > >> > progressively such that HDFS can balance and recover from a >>> > disk >>> > > > > going >>> > > > > > >> > away. >>> > > > > > >> > >>> > > > > > >> > The data we are trying to import into HBase is about >>> > 200million >>> > > > > rows, >>> > > > > > >> each >>> > > > > > >> > row is about hundreds of bytes. Basically, we read in a >>> text >>> > > file >>> > > > > from >>> > > > > > >> > HDFS, >>> > > > > > >> > then for each row in the text file, we fetch data from >>> HBase >>> > > > table, >>> > > > > > we >>> > > > > > >> > compare the column value and update the latest value back >>> into >>> > > > > HBase. >>> > > > > > I >>> > > > > > >> was >>> > > > > > >> > suggested to use CheckAndPut. Anyway this is the common >>> > usecase >>> > > > on >>> > > > > my >>> > > > > > >> > side. >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > Many thanks >>> > > > > > >> > zhenyu >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > > > >> > On Wed, Nov 11, 2009 at 3:03 PM, Jean-Daniel Cryans < >>> > > > > > >> [email protected] >>> > > > > > >> > >wrote: >>> > > > > > >> > >>> > > > > > >> > > 6 disks, 19 machines and you see those kind of issues? >>> I'm >>> > > quite >>> > > > > > >> > > surprised and I want to learn more about what you trying >>> to >>> > > to. >>> > > > > > >> > > >>> > > > > > >> > > But first, if hbase is managing ZK for you then you can >>> > change >>> > > > the >>> > > > > > >> > > value of hbase.zookeeper.property.dataDir so that it >>> points >>> > to >>> > > > its >>> > > > > > own >>> > > > > > >> > > mount. Make sure you also change dfs.data.dir in >>> > > > > conf/hdfs-site.xml >>> > > > > > >> > > (hadoop stuff) so that it stops pointing there (you >>> should >>> > > have >>> > > > 6 >>> > > > > > >> > > values there at the moment if you configured it >>> correctly). >>> > > > Doing >>> > > > > > this >>> > > > > > >> > > could result in data loss if you do it all at once so do >>> the >>> > > > > change >>> > > > > > >> > > progressively. This also means that you may have to keep >>> a >>> > > > > seperate >>> > > > > > >> > > HDFS configuration for those particular nodes. Please ask >>> > more >>> > > > > > >> > > questions if it's not very clear. >>> > > > > > >> > > >>> > > > > > >> > > WRT to dedicating nodes to Zookeeeper, it makes sense >>> only >>> > if >>> > > > you >>> > > > > > have >>> > > > > > >> > > a lot of nodes but in your case, unless there's also CPU >>> > > > > contention, >>> > > > > > >> > > you're good with only a dedicated disk. >>> > > > > > >> > > >>> > > > > > >> > > Now for what you are trying to do, is this a massive >>> import >>> > > that >>> > > > > > goes >>> > > > > > >> > > over many hours? How big is each row? I'd love to get a >>> lot >>> > of >>> > > > > > details >>> > > > > > >> > > to help you optimize your cluster for your particular >>> > problem. >>> > > > > > >> > > >>> > > > > > >> > > Thx, >>> > > > > > >> > > >>> > > > > > >> > > J-D >>> > > > > > >> > > >>> > > > > > >> > > On Wed, Nov 11, 2009 at 11:51 AM, Zhenyu Zhong < >>> > > > > > >> [email protected]> >>> > > > > > >> > > wrote: >>> > > > > > >> > > > J-D, >>> > > > > > >> > > > >>> > > > > > >> > > > Thanks a lot for the insightful comments. >>> > > > > > >> > > > We are running 5 zookeeper quorums now. On these 5 >>> quorum >>> > > > > servers, >>> > > > > > >> we >>> > > > > > >> > > also >>> > > > > > >> > > > run datanode, tasktracker and regionserver. >>> > > > > > >> > > > Now we have 6 disks per machine. >>> > > > > > >> > > > If it is the disk IO saturation, is there anyway to >>> > > configure >>> > > > > > >> zookeeper >>> > > > > > >> > > to >>> > > > > > >> > > > write to a seperate disk which is not the same one used >>> by >>> > > > HDFS? >>> > > > > > >> > > > Or shall we run zookeeper quorums on different nodes >>> other >>> > > > than >>> > > > > > >> HDFS? >>> > > > > > >> > > > >>> > > > > > >> > > > Are HLogs generated by Regionservers? Would it >>> alleviate >>> > the >>> > > > > issue >>> > > > > > >> if >>> > > > > > >> > we >>> > > > > > >> > > use >>> > > > > > >> > > > the first 14 for regionserver, and the rest of 5 for >>> > > > > zookeeper?(In >>> > > > > > >> that >>> > > > > > >> > > > sense, the node running zookeeper quorum won't run >>> > > > regionserver) >>> > > > > > >> > > > >>> > > > > > >> > > > I am very appreciated for your comments. >>> > > > > > >> > > > >>> > > > > > >> > > > Best, >>> > > > > > >> > > > zhenyu >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > >>> > > > > > >> > > > On Wed, Nov 11, 2009 at 2:30 PM, Jean-Daniel Cryans < >>> > > > > > >> > [email protected] >>> > > > > > >> > > >wrote: >>> > > > > > >> > > > >>> > > > > > >> > > >> The problem seems to be that the IO is saturated. When >>> > you >>> > > > see >>> > > > > > >> this: >>> > > > > > >> > > >> >>> > > > > > >> > > >> # >>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN >>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server >>> > > handler >>> > > > 9 >>> > > > > > on >>> > > > > > >> > > >> 60021 took 6819ms appending an edit to hlog; >>> > editcount=370 >>> > > > > > >> > > >> # >>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN >>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server >>> > > handler >>> > > > 2 >>> > > > > > on >>> > > > > > >> > > >> 60021 took 4412ms appending an edit to hlog; >>> > editcount=371 >>> > > > > > >> > > >> >>> > > > > > >> > > >> It's a sign that HDFS is slow. I guess that the >>> zookeeper >>> > > > > quorum >>> > > > > > >> > > >> members are on those same nodes using the same disks? >>> If >>> > > so, >>> > > > > > >> there's >>> > > > > > >> > > >> the problem because Zookeeper needs to be able to >>> commit >>> > > > > > >> transactions >>> > > > > > >> > > >> very fast and if your disks are already hogged then >>> they >>> > > have >>> > > > > to >>> > > > > > >> wait >>> > > > > > >> > > >> and you can get disconnections. Do you still have 19 >>> > quorum >>> > > > > > >> members? >>> > > > > > >> > > >> How many disks do you have per machine? >>> > > > > > >> > > >> >>> > > > > > >> > > >> Thx >>> > > > > > >> > > >> >>> > > > > > >> > > >> J-D >>> > > > > > >> > > >> >>> > > > > > >> > > >> On Wed, Nov 11, 2009 at 11:22 AM, Zhenyu Zhong < >>> > > > > > >> > [email protected] >>> > > > > > >> > > > >>> > > > > > >> > > >> wrote: >>> > > > > > >> > > >> > Please take a look at the following regionserver >>> log. >>> > > > > > >> > > >> > The error started to come out from 4:24am UTC to >>> 4:29am >>> > > > UTC. >>> > > > > > >> > > >> > >>> > > > > > >> > > >> > http://pastebin.com/m3717d064 >>> > > > > > >> > > >> > >>> > > > > > >> > > >> > Thanks in advance. >>> > > > > > >> > > >> > >>> > > > > > >> > > >> > Best, >>> > > > > > >> > > >> > zhenyu >>> > > > > > >> > > >> > >>> > > > > > >> > > >> > >>> > > > > > >> > > >> > On Wed, Nov 11, 2009 at 2:01 PM, Jean-Daniel Cryans >>> < >>> > > > > > >> > > [email protected] >>> > > > > > >> > > >> >wrote: >>> > > > > > >> > > >> > >>> > > > > > >> > > >> >> Could you paste a bigger log trace (interested to >>> see >>> > at >>> > > > > least >>> > > > > > >> all >>> > > > > > >> > > 300 >>> > > > > > >> > > >> >> lines before that) in a pastebin? >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >> Thx, >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >> J-D >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >> On Wed, Nov 11, 2009 at 10:56 AM, Zhenyu Zhong < >>> > > > > > >> > > [email protected] >>> > > > > > >> > > >> > >>> > > > > > >> > > >> >> wrote: >>> > > > > > >> > > >> >> > J-D, >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > Thanks for the reply. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > The problem I have is one or more of the >>> > regionservers >>> > > > > > >> disconnect >>> > > > > > >> > > from >>> > > > > > >> > > >> >> the >>> > > > > > >> > > >> >> > cluster. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > I checked the resource usage including memory >>> Heap >>> > > usage >>> > > > > for >>> > > > > > >> > > >> >> regionserver, >>> > > > > > >> > > >> >> > and found nothing suspicious. The regionserver >>> heap >>> > > > usage >>> > > > > is >>> > > > > > >> way >>> > > > > > >> > > below >>> > > > > > >> > > >> >> the >>> > > > > > >> > > >> >> > maxHeap defined in jvm. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > So I checked zookeeper log, then I found the >>> > following >>> > > > > > >> > > >> >> > BufferUnderflowException, eventually it leads to >>> > > > > > >> SocketException: >>> > > > > > >> > > >> >> Transport >>> > > > > > >> > > >> >> > endpoint is not connected. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > I also checked the regionserver log. It looks >>> like >>> > the >>> > > > > > >> > regionserver >>> > > > > > >> > > >> tried >>> > > > > > >> > > >> >> to >>> > > > > > >> > > >> >> > connect to other zookeeper quorum server. It >>> > connects >>> > > > > > >> > successfully, >>> > > > > > >> > > >> but >>> > > > > > >> > > >> >> then >>> > > > > > >> > > >> >> > it complained about SocketException: transport >>> > > endpoint >>> > > > is >>> > > > > > not >>> > > > > > >> > > >> connected. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > So both zookeeper log and regionserver log told >>> me >>> > > that >>> > > > > the >>> > > > > > >> > > transport >>> > > > > > >> > > >> >> > endpoint is not connected. I don't know why. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > I would be very appreciated if you could point me >>> to >>> > > the >>> > > > > > right >>> > > > > > >> > > >> direction. >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > Best, >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > Regionserver log: >>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > >> Attempting >>> > > > > > >> > > >> >> > connection to server superpyxis0018/ >>> > > > 192.168.100.132:2181 >>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > Priming >>> > > > > > >> > > >> >> > connection to >>> > > java.nio.channels.SocketChannel[connected >>> > > > > > >> local=/ >>> > > > > > >> > > >> >> > 192.168.100.131:43065 remote=superpyxis0018/ >>> > > > > > >> 192.168.100.132:2181 >>> > > > > > >> > ] >>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > Server >>> > > > > > >> > > >> >> > connection successful >>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,768 INFO >>> > > > > > >> > > >> >> > >>> org.apache.hadoop.hbase.regionserver.HRegionServer: >>> > > Got >>> > > > > > >> ZooKeeper >>> > > > > > >> > > >> event, >>> > > > > > >> > > >> >> > state: Disconnected, type: None, path: null >>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > >> Attempting >>> > > > > > >> > > >> >> > connection to server superpyxis0017/ >>> > > > 192.168.100.131:2181 >>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > Priming >>> > > > > > >> > > >> >> > connection to >>> > > java.nio.channels.SocketChannel[connected >>> > > > > > >> local=/ >>> > > > > > >> > > >> >> > 192.168.100.131:46903 remote=superpyxis0017/ >>> > > > > > >> 192.168.100.131:2181 >>> > > > > > >> > ] >>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > Server >>> > > > > > >> > > >> >> > connection successful >>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > >> Exception >>> > > > > > >> > > >> >> > closing session 0x24dfd90c810002 to >>> > > > > > >> > > >> sun.nio.ch.selectionkeyi...@2dd78ccf >>> > > > > > >> > > >> >> > java.io.IOException: Read error rc = -1 >>> > > > > > >> > > >> java.nio.DirectByteBuffer[pos=0 >>> > > > > > >> > > >> >> > lim=4 cap=4] >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >>> > > > > >>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >> > >>> > > > > > >> > >>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) >>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN >>> > > > > > org.apache.zookeeper.ClientCnxn: >>> > > > > > >> > > Ignoring >>> > > > > > >> > > >> >> > exception during shutdown input >>> > > > > > >> > > >> >> > java.net.SocketException: Transport endpoint is >>> not >>> > > > > > connected >>> > > > > > >> > > >> >> > at >>> > sun.nio.ch.SocketChannelImpl.shutdown(Native >>> > > > > > Method) >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >>> > > > > > >>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >>> > > > sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >>> > > > > > >> > >>> > > > > > >>> > > >>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) >>> > > > > > >> > > >> >> > at >>> > > > > > >> > > >> >> > >>> > > > > > >> > >>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> > On Wed, Nov 11, 2009 at 1:40 PM, Jean-Daniel >>> Cryans >>> > < >>> > > > > > >> > > >> [email protected] >>> > > > > > >> > > >> >> >wrote: >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> >> I'm really not sure exactly what the situation >>> is >>> > > with >>> > > > > the >>> > > > > > >> > > >> >> >> java.nio.BufferUnderflowException, but what do >>> you >>> > > mean >>> > > > > by >>> > > > > > >> > > >> >> >> "regionserver disconnection"? Could you give >>> more >>> > > > > details? >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >> Thx, >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >> J-D >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >> On Wed, Nov 11, 2009 at 10:34 AM, Zhenyu Zhong < >>> > > > > > >> > > >> [email protected] >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> >> wrote: >>> > > > > > >> > > >> >> >> > I searched around and found HBASE-1645 "HBase >>> > could >>> > > > be >>> > > > > a >>> > > > > > >> > little >>> > > > > > >> > > >> more >>> > > > > > >> > > >> >> >> dainty >>> > > > > > >> > > >> >> >> > when zk is pulled out from under it" >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> > I am wondering whether my problem is related >>> to >>> > > this >>> > > > > > >> > HBase-1645. >>> > > > > > >> > > Or >>> > > > > > >> > > >> >> >> probably >>> > > > > > >> > > >> >> >> > has nothing to do with it? >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> > thanks >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> > On Wed, Nov 11, 2009 at 12:02 PM, Zhenyu Zhong >>> < >>> > > > > > >> > > >> >> [email protected] >>> > > > > > >> > > >> >> >> >wrote: >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> >> Hi, >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> I have been monitoring the system resources >>> such >>> > > as >>> > > > > > memory >>> > > > > > >> > Heap >>> > > > > > >> > > >> usage >>> > > > > > >> > > >> >> >> for >>> > > > > > >> > > >> >> >> >> Regionserver, load average, network >>> bandwidth, >>> > > etc. >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> I experienced a disconnection last night. >>> When I >>> > > > > checked >>> > > > > > >> the >>> > > > > > >> > > >> memory >>> > > > > > >> > > >> >> Heap >>> > > > > > >> > > >> >> >> >> for Regionserver, it was 2336 which is way >>> below >>> > > the >>> > > > > max >>> > > > > > >> Heap >>> > > > > > >> > > >> 4096. I >>> > > > > > >> > > >> >> >> don't >>> > > > > > >> > > >> >> >> >> think it is Regionserver problem. The >>> requests >>> > > that >>> > > > > the >>> > > > > > >> > > >> regionserver >>> > > > > > >> > > >> >> got >>> > > > > > >> > > >> >> >> was >>> > > > > > >> > > >> >> >> >> about 126 which doesn't seem to be high. The >>> > load >>> > > > > > average >>> > > > > > >> is >>> > > > > > >> > > kind >>> > > > > > >> > > >> of >>> > > > > > >> > > >> >> >> high >>> > > > > > >> > > >> >> >> >> >100. (but I saw disconnections while the >>> load >>> > > > average >>> > > > > > is >>> > > > > > >> > > around >>> > > > > > >> > > >> 30). >>> > > > > > >> > > >> >> >> The >>> > > > > > >> > > >> >> >> >> bandwidth is not a problem. >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> However, I found error messages from the >>> > > zookeeper. >>> > > > > > Please >>> > > > > > >> > see >>> > > > > > >> > > the >>> > > > > > >> > > >> >> >> >> following error messages. There are some >>> > > > > > >> > > >> >> >> >> BufferUnderflowException error, IOException >>> > > > error,etc. >>> > > > > > >> > > >> >> >> >> May I ask what might be the causes that >>> generate >>> > > > these >>> > > > > > >> error >>> > > > > > >> > > >> >> messages? >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> Many thanks >>> > > > > > >> > > >> >> >> >> zhenyu >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN >>> > > > > > >> > > org.apache.zookeeper.server.Request: >>> > > > > > >> > > >> >> >> Ignoring >>> > > > > > >> > > >> >> >> >> exception during toString >>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) >>> > > > > > >> > > >> >> >> >> at >>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >>> > > > > > >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) >>> > > > > > >> > > >> >> >> >> at >>> > > > java.lang.String.valueOf(String.java:2827) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.quorum.CommitProcessor.processRequest(CommitProcessor.java:167) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:68) >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN >>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: >>> > > > > > >> > > >> >> >> >> Exception causing close of session >>> > > 0x424dfd908c50009 >>> > > > > > >> > > >> >> >> >> due to java.io.IOException: Read error >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG >>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: >>> > > > > > >> > > >> >> >> >> IOException stack trace >>> > > > > > >> > > >> >> >> >> java.io.IOException: Read error >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >>> > > > > > >>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:372) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221) >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG >>> > > > > > >> > > >> >> >> >> >>> > > org.apache.zookeeper.server.quorum.CommitProcessor: >>> > > > > > >> > Processing >>> > > > > > >> > > >> >> request:: >>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches >>> > > > > > >> > > >> cxid:0xfffffffffffffff8 >>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN >>> > > > > > >> > > org.apache.zookeeper.server.Request: >>> > > > > > >> > > >> >> >> Ignoring >>> > > > > > >> > > >> >> >> >> exception during toString >>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) >>> > > > > > >> > > >> >> >> >> at >>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >>> > > > > > >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) >>> > > > > > >> > > >> >> >> >> at >>> > > > java.lang.String.valueOf(String.java:2827) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG >>> > > > > > >> > > >> >> >> >> >>> > org.apache.zookeeper.server.FinalRequestProcessor: >>> > > > > > >> Processing >>> > > > > > >> > > >> >> request:: >>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches >>> > > > > > >> > > >> cxid:0xfffffffffffffff8 >>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN >>> > > > > > >> > > org.apache.zookeeper.server.Request: >>> > > > > > >> > > >> >> >> Ignoring >>> > > > > > >> > > >> >> >> >> exception during toString >>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127) >>> > > > > > >> > > >> >> >> >> at >>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >>> > > > > > >> >>> org.apache.zookeeper.server.Request.toString(Request.java:199) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >>> > > > > > >> >>> > > > >>> org.apache.log4j.or.DefaultRenderer.doRender(DefaultRenderer.java:36) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >>> > > > org.apache.log4j.or.RendererMap.findAndRender(RendererMap.java:80) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.log4j.spi.LoggingEvent.getRenderedMessage(LoggingEvent.java:362) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.log4j.helpers.PatternParser$BasicPatternConverter.convert(PatternParser.java:403) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >>> > > > > org.apache.log4j.PatternLayout.format(PatternLayout.java:502) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >>> > > > org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >>> > > > > > >> >>> org.apache.log4j.WriterAppender.append(WriterAppender.java:160) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >>> > > > > > >> >>> > > > >>> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >>> > org.apache.log4j.Category.callAppenders(Category.java:206) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > org.apache.log4j.Category.forcedLog(Category.java:391) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> org.apache.log4j.Category.debug(Category.java:260) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:147) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN >>> > > > > > >> > > org.apache.zookeeper.server.Request: >>> > > > > > >> > > >> >> >> Ignoring >>> > > > > > >> > > >> >> >> >> exception during toString >>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches >>> > > > > > >> > > >> cxid:0xfffffffffffffff8 >>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 INFO >>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: >>> > > > > > >> > > >> >> >> >> closing session:0x424dfd908c50009 >>> NIOServerCnxn: >>> > > > > > >> > > >> >> >> >> java.nio.channels.SocketChannel[connected >>> > local=/ >>> > > > > > >> > > >> >> 192.168.100.131:2181 >>> > > > > > >> > > >> >> >> remote=/ >>> > > > > > >> > > >> >> >> >> 192.168.100.131:35500] >>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG >>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn: >>> > > > > > >> > > >> >> >> >> ignoring exception during input shutdown >>> > > > > > >> > > >> >> >> >> java.net.SocketException: Transport endpoint >>> is >>> > > not >>> > > > > > >> connected >>> > > > > > >> > > >> >> >> >> at >>> > > > > sun.nio.ch.SocketChannelImpl.shutdown(Native >>> > > > > > >> > Method) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >>> > > > > > >>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >>> > > > > > >> >>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >>> > > > > > >> >>> > > > > >>> > org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:848) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >>> > > > > > >>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:498) >>> > > > > > >> > > >> >> >> >> at >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221) >>> > > > > > >> > > >> >> >> >> >>> > > > > > >> > > >> >> >> > >>> > > > > > >> > > >> >> >> >>> > > > > > >> > > >> >> > >>> > > > > > >> > > >> >> >>> > > > > > >> > > >> > >>> > > > > > >> > > >> >>> > > > > > >> > > > >>> > > > > > >> > > >>> > > > > > >> > >>> > > > > > >> >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
