Re: regionserver disconnection

Jean-Daniel Cryans Fri, 13 Nov 2009 13:17:44 -0800

Looks like 
http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/


J-D

On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <[email protected]> wrote:
> Hi,
>
>
> After I re-organize the cluster, the experiment ran into problem faster than
> before.
>
> Basically the changes are to use machines with less resources as zookeeper
> quorums and machines with more resources as regionserver.
>
> From the log, I still see the pause around 1 minute.
> I enabled the GC logging,  please see
>
> http://pastebin.com/m1d4ce0f1
>
> for details.
> In general I don't see many pauses in the GC.
>
> What is more interesting, 1 hour after the 1st regionserver disconnected,
> the master log complained about too many open files. This didn't happen
> before.
> I checked the system OS setup as well as the limits.conf. I also checked the
> running processes. The total open files don't reach the limit. I am confused
> a bit.
>
>
> Please see the following master log.
>
> 2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner:
> RegionManager.metaScanner scan of 4658 row(s) of meta region {server:
> 192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete
> 2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner: All
> 1 .META. region(s) scanned
> 2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x424eebf1c10004c after 3ms
> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 192.168.100.123:50010
> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-2808245019291145247_5478039
> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.EOFException
> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_1074853606841896259_5478048
> 2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x24eebf1043003c after 1ms
> 2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x424eebf1c10004c after 0ms
> 2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x24eebf1043003c after 0ms
> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.io.IOException: Bad connect ack with
> firstBadLink 192.168.100.123:50010
> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_-6642544517082142289_5478063
> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_2057511041109796090_5478063
> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_8219260302213892894_5478064
> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_1669205542853067709_5478235
> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_9128897691346270351_5478237
> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_1116845144864123018_5478240
> 2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x424eebf1c10004c after 0ms
> 2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
> response for sessionid:0x24eebf1043003c after 0ms
> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Too many open files
>        at sun.nio.ch.IOUtil.initPipe(Native Method)
>        at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
>        at
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290)
>
> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8148813491785406356_5478475 bad datanode[0]
> 192.168.100.123:50010
> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_8148813491785406356_5478475 in pipeline
> 192.168.100.123:50010, 192.168.100.134:50010, 192.168.100.122:50010: bad
> datanode 192.168.100.123:50010
> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
> createBlockOutputStream java.net.SocketException: Too many open files
> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> block blk_2119727700857186236_5478498
>
>
>
>
>
> On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <[email protected]>wrote:
>
>> Will do.
>>
>> thanks
>> zhenyu
>>
>>
>> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote:
>>
>>> Enable GC logging too on this next run (see hbase-env.sh).  Lets try and
>>> get
>>> to the bottom of whats going on.
>>> Thanks,
>>> St.Ack
>>>
>>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <[email protected]
>>> >wrote:
>>>
>>> > I can switch the boxes that run zookeeper with the ones that run
>>> > regionservers.
>>> > I will see the results later.
>>> >
>>> > FYI. The node does have the 10 minutes zookeeper.session.timeout value
>>> in
>>> > place.
>>> >
>>> > thanks
>>> > zhenyu
>>> >
>>> > On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote:
>>> >
>>> > > On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong <
>>> [email protected]
>>> > > >wrote:
>>> > >
>>> > > >
>>> > > > In my cluster, half of the cluster have 2 disks 400GB each per
>>> machine,
>>> > > and
>>> > > > half of the cluster have 6 disks per machine.  Maybe we should run
>>> > > > zookeeper
>>> > > > on the machines with 2 disks and RS on machines with 6 disks?
>>> > > >
>>> > > > That would make most sense only in the below, it looks like the RS
>>> that
>>> > > had
>>> > > issue had 4 disks?
>>> > >
>>> > >
>>> > >
>>> > > > BTW, the 10 minutes zookeeper.session.timeout has been set during
>>> the
>>> > > > experiment.
>>> > > >
>>> > >
>>> > > And for sure this node had it in place?
>>> > > St.Ack
>>> > >
>>> > >
>>> > >
>>> > > >
>>> > > > thanks
>>> > > > zhenyu
>>> > > >
>>> > > > On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]> wrote:
>>> > > >
>>> > > > > On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong <
>>> > [email protected]
>>> > > > > >wrote:
>>> > > > >
>>> > > > >
>>> > > > > > Though I experienced 2 regionserver disconnection this morning,
>>> it
>>> > > > looks
>>> > > > > > better from the regionserver log. (please see the following log)
>>> > > > > >
>>> > > > > >
>>> > > > > > http://pastebin.com/m496dbfae
>>> > > > > >
>>> > > > > >
>>> > > > > > I checked diskIO for one of the regionserver(192.168.100.116)
>>> > during
>>> > > > the
>>> > > > > > time it disconnected.
>>> > > > > >
>>> > > > > > Time: 03:04:58 AM
>>> > > > > > Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read
>>> > > Blk_wrtn
>>> > > > > > sda             105.31      5458.83     19503.64 9043873239
>>> > > 32312473676
>>> > > > > > sda1              2.90         6.64        99.25   10993934
>>> > >  164433464
>>> > > > > > sda2              1.72        23.76        30.25   39365817
>>> > > 50124008
>>> > > > > > sda3              0.30         0.38         3.58     624930
>>> > >  5923000
>>> > > > > > sda4            100.39      5428.06     19370.56 8992888270
>>> > > 32091993204
>>> > > > > >
>>> > > > > >
>>> > > > > Is this high for you?  20k blocks/second would seem to be high but
>>> > its
>>> > > > one
>>> > > > > disk only and its not being shared by zk anymore so shouldn't
>>> matter?
>>> > > > >
>>> > > > >
>>> > > > > > I also checked the zookeeper quorum server that the regionserver
>>> > > tried
>>> > > > to
>>> > > > > > connect according to the log. However, I don't see
>>> 192.168.100.116
>>> > in
>>> > > > the
>>> > > > > > client list of the zookeeper quorum server that the regionserver
>>> > > tried
>>> > > > to
>>> > > > > > connect.
>>> > > > > > Would that be a problem?
>>> > > > > >
>>> > > > > >
>>> > > > > Is that because the ephemeral node for the regionserver had
>>> > evaporated?
>>> > > > > Lost
>>> > > > > its lease w/ zk by the time you went to look?
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > > Thu Nov 12 15:04:52 UTC 2009
>>> > > > > > Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
>>> > > > > > Clients:
>>> > > > > >  /192.168.100.127:43045[1](queued=0,recved=26,sent=0)
>>> > > > > >  /192.168.100.131:39091[1](queued=0,recved=964,sent=0)
>>> > > > > >  /192.168.100.124:35961[1](queued=0,recved=3266,sent=0)
>>> > > > > >  /192.168.100.123:47935[1](queued=0,recved=233,sent=0)
>>> > > > > >  /192.168.100.125:46931[1](queued=0,recved=2,sent=0)
>>> > > > > >  /192.168.100.118:54924[1](queued=0,recved=295,sent=0)
>>> > > > > >  /192.168.100.118:41390[1](queued=0,recved=2290,sent=0)
>>> > > > > >  /192.168.100.136:42243[1](queued=0,recved=0,sent=0)
>>> > > > > >
>>> > > > > > Latency min/avg/max: 0/17/6333
>>> > > > > > Received: 47111
>>> > > > > > Sent: 0
>>> > > > > > Outstanding: 0
>>> > > > > > Zxid: 0x77000083f4
>>> > > > > > Mode: leader
>>> > > > > > Node count: 23
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > > That 6 second maximum latency is pretty bad but should be well
>>> within
>>> > > the
>>> > > > > zk
>>> > > > > session timeout.
>>> > > > >
>>> > > > > So, problem is likely on the zk client side of the session; i.e.
>>> in
>>> > the
>>> > > > RS.
>>> > > > > You could enable GC logging as J-D suggested to see if you have
>>> any
>>> > big
>>> > > > > pauses, pauses > zk session timeout.
>>> > > > >
>>> > > > > When the RS went down, it didn't look too heavily loaded:
>>> > > > >
>>> > > > >
>>> > > > >   1. 2009-11-12 15:04:52,830 INFO
>>> > > > >   org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
>>> > metrics:
>>> > > > >   request=1.5166667, regions=322, stores=657, storefiles=631,
>>> > > > >   storefileIndexSize=61, memstoreSize=1472, usedHeap=2819,
>>> > > maxHeap=4079,
>>> > > > >   blockCacheSize=658110960, blockCacheFree=197395984,
>>> > > > blockCacheCount=9903,
>>> > > > >   blockCacheHitRatio=99
>>> > > > >
>>> > > > > Its serving a few reads?  The number of store files seems fine.
>>> Not
>>> > > too
>>> > > > > much memory used.
>>> > > > >
>>> > > > > Looking at the logs, I see the Lease Still Held exception.  This
>>> > > happens
>>> > > > > when the RS does its regular report to the master but the master
>>> > thinks
>>> > > > the
>>> > > > > RS has since restarted.  It'll think this probably because it
>>> noticed
>>> > > > that
>>> > > > > the RS's znode in zk had gone away and it considered the RS dead.
>>> > > > >
>>> > > > > Looking too at your  logs I see this gap in the zk pinging:
>>> > > > >
>>> > > > >
>>> > > > >   1. 2009-11-12 15:03:39,325 DEBUG
>>> org.apache.zookeeper.ClientCnxn:
>>> > Got
>>> > > > >   ping response for sessionid:0x224e55436ad0004 after 0ms
>>> > > > >   2. 2009-11-12 15:03:43,113 DEBUG
>>> org.apache.zookeeper.ClientCnxn:
>>> > Got
>>> > > > >   ping response for sessionid:0x24e55436a0007d after 0ms
>>> > > > >
>>> > > > > Where in the lines above it, its reporting about every ten
>>> seconds,
>>> > > here
>>> > > > > there is a big gap.
>>> > > > >
>>> > > > > Do you have ganglia or something that will let you look more into
>>> > what
>>> > > > was
>>> > > > > happening on this machine around this time?  Is the machine OK?
>>>  It
>>> > > looks
>>> > > > > lightly loaded and you have your cluster nicely laid out.
>>>  Something
>>> > > odd
>>> > > > is
>>> > > > > going on.  What about things like the write speed to disk?  In the
>>> > past
>>> > > > > strange issues have been explained by incorrectly set BIOS which
>>> made
>>> > > > disks
>>> > > > > run at 1/100th of their proper speed.
>>> > > > >
>>> > > > > St.Ack
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > >
>>> > > > > > Best,
>>> > > > > > zhenyu
>>> > > > > >
>>> > > > > >
>>> > > > > > On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong <
>>> > > [email protected]
>>> > > > > > >wrote:
>>> > > > > >
>>> > > > > > > Stack
>>> > > > > > >
>>> > > > > > > I am very appreciated for your comments.
>>> > > > > > > I will use the zookeeper monitoring script on my cluster and
>>> let
>>> > it
>>> > > > run
>>> > > > > > > overnight to see the result.
>>> > > > > > > I will follow up the thread when I get anything.
>>> > > > > > >
>>> > > > > > > thanks
>>> > > > > > > zhenyu
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]>
>>> wrote:
>>> > > > > > >
>>> > > > > > >> I see these in your log too:
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > > >>   1. 2009-11-11 04:27:19,018 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 4544ms
>>> > > > > > >>   2. 2009-11-11 04:27:19,018 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 4548ms
>>> > > > > > >>   3. 2009-11-11 04:27:43,960 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 9030ms
>>> > > > > > >>   4. 2009-11-11 04:27:43,960 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 9030ms
>>> > > > > > >>   5. 2009-11-11 04:27:48,830 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 0ms
>>> > > > > > >>   6. 2009-11-11 04:27:48,830 DEBUG
>>> > > org.apache.zookeeper.ClientCnxn:
>>> > > > > Got
>>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 0ms
>>> > > > > > >>
>>> > > > > > >> Which would indicate lag talking to zk too.
>>> > > > > > >>
>>> > > > > > >> Appending HLog also seems to get steadily worse.
>>> > > > > > >>
>>> > > > > > >> I also see these:
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > > >>   1. 2009-11-11 04:27:01,233 WARN
>>> > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >>    Exception closing session 0x24dfd90c810002 to
>>> > > > > > >>    sun.nio.ch.selectionkeyi...@5e16a7e3
>>> > > > > > >>   2. java.io.IOException: Read error rc = -1
>>> > > > > > >>    java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>>> > > > > > >>    3.         at
>>> > > > > > >>
>>> > > > >
>>>  org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>>> > > > > > >>   4.         at
>>> > > > > > >>
>>> > > >  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > > >> Were all 5 quorum members participating?  Check by doing
>>> "echo
>>> > > > stat|nc
>>> > > > > > >> QUORUM_HOST port" for each member of quorum.... the below
>>> might
>>> > > > help:
>>> > > > > > >>
>>> > > > > > >> members="1,2,3,4,5"
>>> > > > > > >> while true; do date; for i in $members; do echo $i; echo stat
>>> |
>>> > nc
>>> > > > $i
>>> > > > > > >> 2181;
>>> > > > > > >> done; sleep 10; done
>>> > > > > > >>
>>> > > > > > >> We need to improve our zk_dump.  It just reports quorum
>>> > members...
>>> > > > not
>>> > > > > > if
>>> > > > > > >> they are actually participating or not.  The above will tell
>>> you
>>> > > if
>>> > > > > dead
>>> > > > > > >> or
>>> > > > > > >> alive.
>>> > > > > > >>
>>> > > > > > >> Please report back after checking above and following the
>>> > > > Jean-Daniel
>>> > > > > > >> prescription.
>>> > > > > > >>
>>> > > > > > >> Thanks,
>>> > > > > > >> St.Ack
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > > >> On Wed, Nov 11, 2009 at 12:29 PM, Zhenyu Zhong <
>>> > > > > [email protected]
>>> > > > > > >> >wrote:
>>> > > > > > >>
>>> > > > > > >> > Thanks a lot for the help.
>>> > > > > > >> >
>>> > > > > > >> > I understand that the change of disk configurations should
>>> be
>>> > > done
>>> > > > > > >> > progressively such that HDFS can balance and recover from a
>>> > disk
>>> > > > > going
>>> > > > > > >> > away.
>>> > > > > > >> >
>>> > > > > > >> > The data we are trying to import into HBase is about
>>> > 200million
>>> > > > > rows,
>>> > > > > > >> each
>>> > > > > > >> > row is about hundreds of bytes. Basically, we read in a
>>> text
>>> > > file
>>> > > > > from
>>> > > > > > >> > HDFS,
>>> > > > > > >> > then  for each row in the text file, we fetch data from
>>> HBase
>>> > > > table,
>>> > > > > > we
>>> > > > > > >> > compare the column value and update the latest value back
>>> into
>>> > > > > HBase.
>>> > > > > > I
>>> > > > > > >> was
>>> > > > > > >> > suggested to use CheckAndPut.  Anyway this is the common
>>> > usecase
>>> > > > on
>>> > > > > my
>>> > > > > > >> > side.
>>> > > > > > >> >
>>> > > > > > >> >
>>> > > > > > >> > Many thanks
>>> > > > > > >> > zhenyu
>>> > > > > > >> >
>>> > > > > > >> >
>>> > > > > > >> >
>>> > > > > > >> > On Wed, Nov 11, 2009 at 3:03 PM, Jean-Daniel Cryans <
>>> > > > > > >> [email protected]
>>> > > > > > >> > >wrote:
>>> > > > > > >> >
>>> > > > > > >> > > 6 disks, 19 machines and you see those kind of issues?
>>> I'm
>>> > > quite
>>> > > > > > >> > > surprised and I want to learn more about what you trying
>>> to
>>> > > to.
>>> > > > > > >> > >
>>> > > > > > >> > > But first, if hbase is managing ZK for you then you can
>>> > change
>>> > > > the
>>> > > > > > >> > > value of hbase.zookeeper.property.dataDir so that it
>>> points
>>> > to
>>> > > > its
>>> > > > > > own
>>> > > > > > >> > > mount. Make sure you also change dfs.data.dir in
>>> > > > > conf/hdfs-site.xml
>>> > > > > > >> > > (hadoop stuff) so that it stops pointing there (you
>>> should
>>> > > have
>>> > > > 6
>>> > > > > > >> > > values there at the moment if you configured it
>>> correctly).
>>> > > > Doing
>>> > > > > > this
>>> > > > > > >> > > could result in data loss if you do it all at once so do
>>> the
>>> > > > > change
>>> > > > > > >> > > progressively. This also means that you may have to keep
>>> a
>>> > > > > seperate
>>> > > > > > >> > > HDFS configuration for those particular nodes. Please ask
>>> > more
>>> > > > > > >> > > questions if it's not very clear.
>>> > > > > > >> > >
>>> > > > > > >> > > WRT to dedicating nodes to Zookeeeper, it makes sense
>>> only
>>> > if
>>> > > > you
>>> > > > > > have
>>> > > > > > >> > > a lot of nodes but in your case, unless there's also CPU
>>> > > > > contention,
>>> > > > > > >> > > you're good with only a dedicated disk.
>>> > > > > > >> > >
>>> > > > > > >> > > Now for what you are trying to do, is this a massive
>>> import
>>> > > that
>>> > > > > > goes
>>> > > > > > >> > > over many hours? How big is each row? I'd love to get a
>>> lot
>>> > of
>>> > > > > > details
>>> > > > > > >> > > to help you optimize your cluster for your particular
>>> > problem.
>>> > > > > > >> > >
>>> > > > > > >> > > Thx,
>>> > > > > > >> > >
>>> > > > > > >> > > J-D
>>> > > > > > >> > >
>>> > > > > > >> > > On Wed, Nov 11, 2009 at 11:51 AM, Zhenyu Zhong <
>>> > > > > > >> [email protected]>
>>> > > > > > >> > > wrote:
>>> > > > > > >> > > > J-D,
>>> > > > > > >> > > >
>>> > > > > > >> > > > Thanks a lot for the insightful comments.
>>> > > > > > >> > > > We are running 5 zookeeper quorums now. On these 5
>>> quorum
>>> > > > > servers,
>>> > > > > > >> we
>>> > > > > > >> > > also
>>> > > > > > >> > > > run datanode, tasktracker and regionserver.
>>> > > > > > >> > > > Now we have 6 disks per machine.
>>> > > > > > >> > > > If it is the disk IO saturation, is there anyway to
>>> > > configure
>>> > > > > > >> zookeeper
>>> > > > > > >> > > to
>>> > > > > > >> > > > write to a seperate disk which is not the same one used
>>> by
>>> > > > HDFS?
>>> > > > > > >> > > > Or shall we run zookeeper quorums on different nodes
>>> other
>>> > > > than
>>> > > > > > >> HDFS?
>>> > > > > > >> > > >
>>> > > > > > >> > > > Are HLogs generated by Regionservers? Would it
>>> alleviate
>>> > the
>>> > > > > issue
>>> > > > > > >> if
>>> > > > > > >> > we
>>> > > > > > >> > > use
>>> > > > > > >> > > > the first 14 for regionserver, and the rest of 5 for
>>> > > > > zookeeper?(In
>>> > > > > > >> that
>>> > > > > > >> > > > sense, the node running zookeeper quorum won't run
>>> > > > regionserver)
>>> > > > > > >> > > >
>>> > > > > > >> > > > I am very appreciated for your comments.
>>> > > > > > >> > > >
>>> > > > > > >> > > > Best,
>>> > > > > > >> > > > zhenyu
>>> > > > > > >> > > >
>>> > > > > > >> > > >
>>> > > > > > >> > > >
>>> > > > > > >> > > > On Wed, Nov 11, 2009 at 2:30 PM, Jean-Daniel Cryans <
>>> > > > > > >> > [email protected]
>>> > > > > > >> > > >wrote:
>>> > > > > > >> > > >
>>> > > > > > >> > > >> The problem seems to be that the IO is saturated. When
>>> > you
>>> > > > see
>>> > > > > > >> this:
>>> > > > > > >> > > >>
>>> > > > > > >> > > >> #
>>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN
>>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server
>>> > > handler
>>> > > > 9
>>> > > > > > on
>>> > > > > > >> > > >> 60021 took 6819ms appending an edit to hlog;
>>> > editcount=370
>>> > > > > > >> > > >> #
>>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN
>>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server
>>> > > handler
>>> > > > 2
>>> > > > > > on
>>> > > > > > >> > > >> 60021 took 4412ms appending an edit to hlog;
>>> > editcount=371
>>> > > > > > >> > > >>
>>> > > > > > >> > > >> It's a sign that HDFS is slow. I guess that the
>>> zookeeper
>>> > > > > quorum
>>> > > > > > >> > > >> members are on those same nodes using the same disks?
>>> If
>>> > > so,
>>> > > > > > >> there's
>>> > > > > > >> > > >> the problem because Zookeeper needs to be able to
>>> commit
>>> > > > > > >> transactions
>>> > > > > > >> > > >> very fast and if your disks are already hogged then
>>> they
>>> > > have
>>> > > > > to
>>> > > > > > >> wait
>>> > > > > > >> > > >> and you can get disconnections. Do you still have 19
>>> > quorum
>>> > > > > > >> members?
>>> > > > > > >> > > >> How many disks do you have per machine?
>>> > > > > > >> > > >>
>>> > > > > > >> > > >> Thx
>>> > > > > > >> > > >>
>>> > > > > > >> > > >> J-D
>>> > > > > > >> > > >>
>>> > > > > > >> > > >> On Wed, Nov 11, 2009 at 11:22 AM, Zhenyu Zhong <
>>> > > > > > >> > [email protected]
>>> > > > > > >> > > >
>>> > > > > > >> > > >> wrote:
>>> > > > > > >> > > >> > Please take a look at the following regionserver
>>> log.
>>> > > > > > >> > > >> > The error started to come out from 4:24am UTC to
>>> 4:29am
>>> > > > UTC.
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> > http://pastebin.com/m3717d064
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> > Thanks in advance.
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> > Best,
>>> > > > > > >> > > >> > zhenyu
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> > On Wed, Nov 11, 2009 at 2:01 PM, Jean-Daniel Cryans
>>> <
>>> > > > > > >> > > [email protected]
>>> > > > > > >> > > >> >wrote:
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> >> Could you paste a bigger log trace (interested to
>>> see
>>> > at
>>> > > > > least
>>> > > > > > >> all
>>> > > > > > >> > > 300
>>> > > > > > >> > > >> >> lines before that) in a pastebin?
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >> >> Thx,
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >> >> J-D
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >> >> On Wed, Nov 11, 2009 at 10:56 AM, Zhenyu Zhong <
>>> > > > > > >> > > [email protected]
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >> >> wrote:
>>> > > > > > >> > > >> >> > J-D,
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > Thanks for the reply.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > The problem I have is one or more of the
>>> > regionservers
>>> > > > > > >> disconnect
>>> > > > > > >> > > from
>>> > > > > > >> > > >> >> the
>>> > > > > > >> > > >> >> > cluster.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > I checked the resource usage including memory
>>> Heap
>>> > > usage
>>> > > > > for
>>> > > > > > >> > > >> >> regionserver,
>>> > > > > > >> > > >> >> > and found nothing suspicious. The regionserver
>>> heap
>>> > > > usage
>>> > > > > is
>>> > > > > > >> way
>>> > > > > > >> > > below
>>> > > > > > >> > > >> >> the
>>> > > > > > >> > > >> >> > maxHeap defined in jvm.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > So I checked zookeeper log, then I found the
>>> > following
>>> > > > > > >> > > >> >> > BufferUnderflowException, eventually it leads to
>>> > > > > > >> SocketException:
>>> > > > > > >> > > >> >> Transport
>>> > > > > > >> > > >> >> > endpoint is not connected.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > I also checked the regionserver log. It looks
>>> like
>>> > the
>>> > > > > > >> > regionserver
>>> > > > > > >> > > >> tried
>>> > > > > > >> > > >> >> to
>>> > > > > > >> > > >> >> > connect to other zookeeper quorum server. It
>>> > connects
>>> > > > > > >> > successfully,
>>> > > > > > >> > > >> but
>>> > > > > > >> > > >> >> then
>>> > > > > > >> > > >> >> > it complained about SocketException: transport
>>> > > endpoint
>>> > > > is
>>> > > > > > not
>>> > > > > > >> > > >> connected.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > So both zookeeper log and regionserver log told
>>> me
>>> > > that
>>> > > > > the
>>> > > > > > >> > > transport
>>> > > > > > >> > > >> >> > endpoint is not connected. I don't know why.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > I would be very appreciated if you could point me
>>> to
>>> > > the
>>> > > > > > right
>>> > > > > > >> > > >> direction.
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > Best,
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > Regionserver log:
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > >> Attempting
>>> > > > > > >> > > >> >> > connection to server superpyxis0018/
>>> > > > 192.168.100.132:2181
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > Priming
>>> > > > > > >> > > >> >> > connection to
>>> > > java.nio.channels.SocketChannel[connected
>>> > > > > > >> local=/
>>> > > > > > >> > > >> >> > 192.168.100.131:43065 remote=superpyxis0018/
>>> > > > > > >> 192.168.100.132:2181
>>> > > > > > >> > ]
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > Server
>>> > > > > > >> > > >> >> > connection successful
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,768 INFO
>>> > > > > > >> > > >> >> >
>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>> > > Got
>>> > > > > > >> ZooKeeper
>>> > > > > > >> > > >> event,
>>> > > > > > >> > > >> >> > state: Disconnected, type: None, path: null
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > >> Attempting
>>> > > > > > >> > > >> >> > connection to server superpyxis0017/
>>> > > > 192.168.100.131:2181
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > Priming
>>> > > > > > >> > > >> >> > connection to
>>> > > java.nio.channels.SocketChannel[connected
>>> > > > > > >> local=/
>>> > > > > > >> > > >> >> > 192.168.100.131:46903 remote=superpyxis0017/
>>> > > > > > >> 192.168.100.131:2181
>>> > > > > > >> > ]
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > Server
>>> > > > > > >> > > >> >> > connection successful
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > >> Exception
>>> > > > > > >> > > >> >> > closing session 0x24dfd90c810002 to
>>> > > > > > >> > > >> sun.nio.ch.selectionkeyi...@2dd78ccf
>>> > > > > > >> > > >> >> > java.io.IOException: Read error rc = -1
>>> > > > > > >> > > >> java.nio.DirectByteBuffer[pos=0
>>> > > > > > >> > > >> >> > lim=4 cap=4]
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > >
>>> > > > >
>>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >> >> >
>>> > > > > > >> >
>>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN
>>> > > > > > org.apache.zookeeper.ClientCnxn:
>>> > > > > > >> > > Ignoring
>>> > > > > > >> > > >> >> > exception during shutdown input
>>> > > > > > >> > > >> >> > java.net.SocketException: Transport endpoint is
>>> not
>>> > > > > > connected
>>> > > > > > >> > > >> >> >        at
>>> > sun.nio.ch.SocketChannelImpl.shutdown(Native
>>> > > > > > Method)
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > >
>>> > > > > >
>>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >>
>>> > > > sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >>
>>> > > > > > >> >
>>> > > > > >
>>> > >
>>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>>> > > > > > >> > > >> >> >        at
>>> > > > > > >> > > >> >> >
>>> > > > > > >> >
>>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> > On Wed, Nov 11, 2009 at 1:40 PM, Jean-Daniel
>>> Cryans
>>> > <
>>> > > > > > >> > > >> [email protected]
>>> > > > > > >> > > >> >> >wrote:
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >> I'm really not sure exactly what the situation
>>> is
>>> > > with
>>> > > > > the
>>> > > > > > >> > > >> >> >> java.nio.BufferUnderflowException, but what do
>>> you
>>> > > mean
>>> > > > > by
>>> > > > > > >> > > >> >> >> "regionserver disconnection"? Could you give
>>> more
>>> > > > > details?
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >> >> Thx,
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >> >> J-D
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >> >> On Wed, Nov 11, 2009 at 10:34 AM, Zhenyu Zhong <
>>> > > > > > >> > > >> [email protected]
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >> >> wrote:
>>> > > > > > >> > > >> >> >> > I searched around and found HBASE-1645 "HBase
>>> > could
>>> > > > be
>>> > > > > a
>>> > > > > > >> > little
>>> > > > > > >> > > >> more
>>> > > > > > >> > > >> >> >> dainty
>>> > > > > > >> > > >> >> >> > when zk is pulled out from under it"
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >> > I am wondering whether my problem is related
>>> to
>>> > > this
>>> > > > > > >> > HBase-1645.
>>> > > > > > >> > > Or
>>> > > > > > >> > > >> >> >> probably
>>> > > > > > >> > > >> >> >> > has nothing to do with it?
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >> > thanks
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >> > On Wed, Nov 11, 2009 at 12:02 PM, Zhenyu Zhong
>>> <
>>> > > > > > >> > > >> >> [email protected]
>>> > > > > > >> > > >> >> >> >wrote:
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >> >> Hi,
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >> I have been monitoring the system resources
>>> such
>>> > > as
>>> > > > > > memory
>>> > > > > > >> > Heap
>>> > > > > > >> > > >> usage
>>> > > > > > >> > > >> >> >> for
>>> > > > > > >> > > >> >> >> >> Regionserver, load average, network
>>> bandwidth,
>>> > > etc.
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >> I experienced a disconnection last night.
>>> When I
>>> > > > > checked
>>> > > > > > >> the
>>> > > > > > >> > > >> memory
>>> > > > > > >> > > >> >> Heap
>>> > > > > > >> > > >> >> >> >> for Regionserver, it was 2336 which is way
>>> below
>>> > > the
>>> > > > > max
>>> > > > > > >> Heap
>>> > > > > > >> > > >> 4096. I
>>> > > > > > >> > > >> >> >> don't
>>> > > > > > >> > > >> >> >> >> think it is Regionserver problem. The
>>> requests
>>> > > that
>>> > > > > the
>>> > > > > > >> > > >> regionserver
>>> > > > > > >> > > >> >> got
>>> > > > > > >> > > >> >> >> was
>>> > > > > > >> > > >> >> >> >> about 126 which doesn't seem to be high. The
>>> > load
>>> > > > > > average
>>> > > > > > >> is
>>> > > > > > >> > > kind
>>> > > > > > >> > > >> of
>>> > > > > > >> > > >> >> >> high
>>> > > > > > >> > > >> >> >> >> >100. (but I saw disconnections while the
>>> load
>>> > > > average
>>> > > > > > is
>>> > > > > > >> > > around
>>> > > > > > >> > > >> 30).
>>> > > > > > >> > > >> >> >> The
>>> > > > > > >> > > >> >> >> >> bandwidth is not a problem.
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >> However, I found error messages from the
>>> > > zookeeper.
>>> > > > > > Please
>>> > > > > > >> > see
>>> > > > > > >> > > the
>>> > > > > > >> > > >> >> >> >> following error messages. There are some
>>> > > > > > >> > > >> >> >> >> BufferUnderflowException error, IOException
>>> > > > error,etc.
>>> > > > > > >> > > >> >> >> >> May I ask what might be the causes that
>>> generate
>>> > > > these
>>> > > > > > >> error
>>> > > > > > >> > > >> >> messages?
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >> Many thanks
>>> > > > > > >> > > >> >> >> >> zhenyu
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>>> > > > > > >> > > org.apache.zookeeper.server.Request:
>>> > > > > > >> > > >> >> >> Ignoring
>>> > > > > > >> > > >> >> >> >> exception during toString
>>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >>
>>> > > > > > >>
>>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > java.lang.String.valueOf(String.java:2827)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.quorum.CommitProcessor.processRequest(CommitProcessor.java:167)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:68)
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > > > > >> > > >> >> >> >> Exception causing close of session
>>> > > 0x424dfd908c50009
>>> > > > > > >> > > >> >> >> >> due to java.io.IOException: Read error
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > > > > >> > > >> >> >> >> IOException stack trace
>>> > > > > > >> > > >> >> >> >> java.io.IOException: Read error
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > >
>>> > > > > >
>>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:372)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>>> > > > > > >> > > >> >> >> >>
>>> > > org.apache.zookeeper.server.quorum.CommitProcessor:
>>> > > > > > >> > Processing
>>> > > > > > >> > > >> >> request::
>>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>>> > > > > > >> > > org.apache.zookeeper.server.Request:
>>> > > > > > >> > > >> >> >> Ignoring
>>> > > > > > >> > > >> >> >> >> exception during toString
>>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >>
>>> > > > > > >>
>>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > java.lang.String.valueOf(String.java:2827)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>>> > > > > > >> > > >> >> >> >>
>>> > org.apache.zookeeper.server.FinalRequestProcessor:
>>> > > > > > >> Processing
>>> > > > > > >> > > >> >> request::
>>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>>> > > > > > >> > > org.apache.zookeeper.server.Request:
>>> > > > > > >> > > >> >> >> Ignoring
>>> > > > > > >> > > >> >> >> >> exception during toString
>>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >>
>>> > > > > > >>
>>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >>
>>> > > > > > >>
>>> > > >
>>> org.apache.log4j.or.DefaultRenderer.doRender(DefaultRenderer.java:36)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > >
>>> > > > org.apache.log4j.or.RendererMap.findAndRender(RendererMap.java:80)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.log4j.spi.LoggingEvent.getRenderedMessage(LoggingEvent.java:362)
>>> > > > > > >> > > >> >> >> >> at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.log4j.helpers.PatternParser$BasicPatternConverter.convert(PatternParser.java:403)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >>
>>> > > > > org.apache.log4j.PatternLayout.format(PatternLayout.java:502)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > >
>>> > > > org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >>
>>> > > > > > >>
>>> org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >>
>>> > > > > > >>
>>> > > >
>>> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >>
>>> > org.apache.log4j.Category.callAppenders(Category.java:206)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > org.apache.log4j.Category.forcedLog(Category.java:391)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> org.apache.log4j.Category.debug(Category.java:260)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:147)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>>> > > > > > >> > > org.apache.zookeeper.server.Request:
>>> > > > > > >> > > >> >> >> Ignoring
>>> > > > > > >> > > >> >> >> >> exception during toString
>>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 INFO
>>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > > > > >> > > >> >> >> >> closing session:0x424dfd908c50009
>>> NIOServerCnxn:
>>> > > > > > >> > > >> >> >> >> java.nio.channels.SocketChannel[connected
>>> > local=/
>>> > > > > > >> > > >> >> 192.168.100.131:2181
>>> > > > > > >> > > >> >> >> remote=/
>>> > > > > > >> > > >> >> >> >> 192.168.100.131:35500]
>>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > > > > >> > > >> >> >> >> ignoring exception during input shutdown
>>> > > > > > >> > > >> >> >> >> java.net.SocketException: Transport endpoint
>>> is
>>> > > not
>>> > > > > > >> connected
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > sun.nio.ch.SocketChannelImpl.shutdown(Native
>>> > > > > > >> > Method)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > >
>>> > > > > >
>>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >>
>>> > > > > > >>
>>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > >
>>> > > > > > >>
>>> > > > >
>>> > org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:848)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > >
>>> > > > > >
>>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:498)
>>> > > > > > >> > > >> >> >> >>         at
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >>
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>>> > > > > > >> > > >> >> >> >>
>>> > > > > > >> > > >> >> >> >
>>> > > > > > >> > > >> >> >>
>>> > > > > > >> > > >> >> >
>>> > > > > > >> > > >> >>
>>> > > > > > >> > > >> >
>>> > > > > > >> > > >>
>>> > > > > > >> > > >
>>> > > > > > >> > >
>>> > > > > > >> >
>>> > > > > > >>
>>> > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: regionserver disconnection

Reply via email to