Re: regionserver disconnection

Zhenyu Zhong Fri, 13 Nov 2009 13:13:39 -0800

Hi,


After I re-organize the cluster, the experiment ran into problem faster than
before.

Basically the changes are to use machines with less resources as zookeeper
quorums and machines with more resources as regionserver.

>From the log, I still see the pause around 1 minute.
I enabled the GC logging,  please see

http://pastebin.com/m1d4ce0f1

for details.
In general I don't see many pauses in the GC.

What is more interesting, 1 hour after the 1st regionserver disconnected,
the master log complained about too many open files. This didn't happen
before.
I checked the system OS setup as well as the limits.conf. I also checked the
running processes. The total open files don't reach the limit. I am confused
a bit.


Please see the following master log.

2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 4658 row(s) of meta region {server:
192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete
2009-11-13 20:06:02,114 INFO org.apache.hadoop.hbase.master.BaseScanner: All
1 .META. region(s) scanned
2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x424eebf1c10004c after 3ms
2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.100.123:50010
2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_-2808245019291145247_5478039
2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.EOFException
2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_1074853606841896259_5478048
2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x24eebf1043003c after 1ms
2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x424eebf1c10004c after 0ms
2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x24eebf1043003c after 0ms
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.100.123:50010
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_-6642544517082142289_5478063
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_2057511041109796090_5478063
2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_8219260302213892894_5478064
2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_1669205542853067709_5478235
2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_9128897691346270351_5478237
2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_1116845144864123018_5478240
2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x424eebf1c10004c after 0ms
2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x24eebf1043003c after 0ms
2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.io.IOException: Too many open files
        at sun.nio.ch.IOUtil.initPipe(Native Method)
        at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
        at
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
        at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290)

2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_8148813491785406356_5478475 bad datanode[0]
192.168.100.123:50010
2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_8148813491785406356_5478475 in pipeline
192.168.100.123:50010, 192.168.100.134:50010, 192.168.100.122:50010: bad
datanode 192.168.100.123:50010
2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
createBlockOutputStream java.net.SocketException: Too many open files
2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_2119727700857186236_5478498





On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <[email protected]>wrote:

> Will do.
>
> thanks
> zhenyu
>
>
> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote:
>
>> Enable GC logging too on this next run (see hbase-env.sh).  Lets try and
>> get
>> to the bottom of whats going on.
>> Thanks,
>> St.Ack
>>
>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <[email protected]
>> >wrote:
>>
>> > I can switch the boxes that run zookeeper with the ones that run
>> > regionservers.
>> > I will see the results later.
>> >
>> > FYI. The node does have the 10 minutes zookeeper.session.timeout value
>> in
>> > place.
>> >
>> > thanks
>> > zhenyu
>> >
>> > On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote:
>> >
>> > > On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong <
>> [email protected]
>> > > >wrote:
>> > >
>> > > >
>> > > > In my cluster, half of the cluster have 2 disks 400GB each per
>> machine,
>> > > and
>> > > > half of the cluster have 6 disks per machine.  Maybe we should run
>> > > > zookeeper
>> > > > on the machines with 2 disks and RS on machines with 6 disks?
>> > > >
>> > > > That would make most sense only in the below, it looks like the RS
>> that
>> > > had
>> > > issue had 4 disks?
>> > >
>> > >
>> > >
>> > > > BTW, the 10 minutes zookeeper.session.timeout has been set during
>> the
>> > > > experiment.
>> > > >
>> > >
>> > > And for sure this node had it in place?
>> > > St.Ack
>> > >
>> > >
>> > >
>> > > >
>> > > > thanks
>> > > > zhenyu
>> > > >
>> > > > On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]> wrote:
>> > > >
>> > > > > On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong <
>> > [email protected]
>> > > > > >wrote:
>> > > > >
>> > > > >
>> > > > > > Though I experienced 2 regionserver disconnection this morning,
>> it
>> > > > looks
>> > > > > > better from the regionserver log. (please see the following log)
>> > > > > >
>> > > > > >
>> > > > > > http://pastebin.com/m496dbfae
>> > > > > >
>> > > > > >
>> > > > > > I checked diskIO for one of the regionserver(192.168.100.116)
>> > during
>> > > > the
>> > > > > > time it disconnected.
>> > > > > >
>> > > > > > Time: 03:04:58 AM
>> > > > > > Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read
>> > > Blk_wrtn
>> > > > > > sda             105.31      5458.83     19503.64 9043873239
>> > > 32312473676
>> > > > > > sda1              2.90         6.64        99.25   10993934
>> > >  164433464
>> > > > > > sda2              1.72        23.76        30.25   39365817
>> > > 50124008
>> > > > > > sda3              0.30         0.38         3.58     624930
>> > >  5923000
>> > > > > > sda4            100.39      5428.06     19370.56 8992888270
>> > > 32091993204
>> > > > > >
>> > > > > >
>> > > > > Is this high for you?  20k blocks/second would seem to be high but
>> > its
>> > > > one
>> > > > > disk only and its not being shared by zk anymore so shouldn't
>> matter?
>> > > > >
>> > > > >
>> > > > > > I also checked the zookeeper quorum server that the regionserver
>> > > tried
>> > > > to
>> > > > > > connect according to the log. However, I don't see
>> 192.168.100.116
>> > in
>> > > > the
>> > > > > > client list of the zookeeper quorum server that the regionserver
>> > > tried
>> > > > to
>> > > > > > connect.
>> > > > > > Would that be a problem?
>> > > > > >
>> > > > > >
>> > > > > Is that because the ephemeral node for the regionserver had
>> > evaporated?
>> > > > > Lost
>> > > > > its lease w/ zk by the time you went to look?
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > > Thu Nov 12 15:04:52 UTC 2009
>> > > > > > Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48 GMT
>> > > > > > Clients:
>> > > > > >  /192.168.100.127:43045[1](queued=0,recved=26,sent=0)
>> > > > > >  /192.168.100.131:39091[1](queued=0,recved=964,sent=0)
>> > > > > >  /192.168.100.124:35961[1](queued=0,recved=3266,sent=0)
>> > > > > >  /192.168.100.123:47935[1](queued=0,recved=233,sent=0)
>> > > > > >  /192.168.100.125:46931[1](queued=0,recved=2,sent=0)
>> > > > > >  /192.168.100.118:54924[1](queued=0,recved=295,sent=0)
>> > > > > >  /192.168.100.118:41390[1](queued=0,recved=2290,sent=0)
>> > > > > >  /192.168.100.136:42243[1](queued=0,recved=0,sent=0)
>> > > > > >
>> > > > > > Latency min/avg/max: 0/17/6333
>> > > > > > Received: 47111
>> > > > > > Sent: 0
>> > > > > > Outstanding: 0
>> > > > > > Zxid: 0x77000083f4
>> > > > > > Mode: leader
>> > > > > > Node count: 23
>> > > > > >
>> > > > >
>> > > > >
>> > > > > That 6 second maximum latency is pretty bad but should be well
>> within
>> > > the
>> > > > > zk
>> > > > > session timeout.
>> > > > >
>> > > > > So, problem is likely on the zk client side of the session; i.e.
>> in
>> > the
>> > > > RS.
>> > > > > You could enable GC logging as J-D suggested to see if you have
>> any
>> > big
>> > > > > pauses, pauses > zk session timeout.
>> > > > >
>> > > > > When the RS went down, it didn't look too heavily loaded:
>> > > > >
>> > > > >
>> > > > >   1. 2009-11-12 15:04:52,830 INFO
>> > > > >   org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
>> > metrics:
>> > > > >   request=1.5166667, regions=322, stores=657, storefiles=631,
>> > > > >   storefileIndexSize=61, memstoreSize=1472, usedHeap=2819,
>> > > maxHeap=4079,
>> > > > >   blockCacheSize=658110960, blockCacheFree=197395984,
>> > > > blockCacheCount=9903,
>> > > > >   blockCacheHitRatio=99
>> > > > >
>> > > > > Its serving a few reads?  The number of store files seems fine.
>> Not
>> > > too
>> > > > > much memory used.
>> > > > >
>> > > > > Looking at the logs, I see the Lease Still Held exception.  This
>> > > happens
>> > > > > when the RS does its regular report to the master but the master
>> > thinks
>> > > > the
>> > > > > RS has since restarted.  It'll think this probably because it
>> noticed
>> > > > that
>> > > > > the RS's znode in zk had gone away and it considered the RS dead.
>> > > > >
>> > > > > Looking too at your  logs I see this gap in the zk pinging:
>> > > > >
>> > > > >
>> > > > >   1. 2009-11-12 15:03:39,325 DEBUG
>> org.apache.zookeeper.ClientCnxn:
>> > Got
>> > > > >   ping response for sessionid:0x224e55436ad0004 after 0ms
>> > > > >   2. 2009-11-12 15:03:43,113 DEBUG
>> org.apache.zookeeper.ClientCnxn:
>> > Got
>> > > > >   ping response for sessionid:0x24e55436a0007d after 0ms
>> > > > >
>> > > > > Where in the lines above it, its reporting about every ten
>> seconds,
>> > > here
>> > > > > there is a big gap.
>> > > > >
>> > > > > Do you have ganglia or something that will let you look more into
>> > what
>> > > > was
>> > > > > happening on this machine around this time?  Is the machine OK?
>>  It
>> > > looks
>> > > > > lightly loaded and you have your cluster nicely laid out.
>>  Something
>> > > odd
>> > > > is
>> > > > > going on.  What about things like the write speed to disk?  In the
>> > past
>> > > > > strange issues have been explained by incorrectly set BIOS which
>> made
>> > > > disks
>> > > > > run at 1/100th of their proper speed.
>> > > > >
>> > > > > St.Ack
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > zhenyu
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong <
>> > > [email protected]
>> > > > > > >wrote:
>> > > > > >
>> > > > > > > Stack
>> > > > > > >
>> > > > > > > I am very appreciated for your comments.
>> > > > > > > I will use the zookeeper monitoring script on my cluster and
>> let
>> > it
>> > > > run
>> > > > > > > overnight to see the result.
>> > > > > > > I will follow up the thread when I get anything.
>> > > > > > >
>> > > > > > > thanks
>> > > > > > > zhenyu
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]>
>> wrote:
>> > > > > > >
>> > > > > > >> I see these in your log too:
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>   1. 2009-11-11 04:27:19,018 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 4544ms
>> > > > > > >>   2. 2009-11-11 04:27:19,018 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 4548ms
>> > > > > > >>   3. 2009-11-11 04:27:43,960 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 9030ms
>> > > > > > >>   4. 2009-11-11 04:27:43,960 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 9030ms
>> > > > > > >>   5. 2009-11-11 04:27:48,830 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x424dfd908c50009 after 0ms
>> > > > > > >>   6. 2009-11-11 04:27:48,830 DEBUG
>> > > org.apache.zookeeper.ClientCnxn:
>> > > > > Got
>> > > > > > >>   ping response for sessionid:0x24dfd90c810002 after 0ms
>> > > > > > >>
>> > > > > > >> Which would indicate lag talking to zk too.
>> > > > > > >>
>> > > > > > >> Appending HLog also seems to get steadily worse.
>> > > > > > >>
>> > > > > > >> I also see these:
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>   1. 2009-11-11 04:27:01,233 WARN
>> > org.apache.zookeeper.ClientCnxn:
>> > > > > > >>    Exception closing session 0x24dfd90c810002 to
>> > > > > > >>    sun.nio.ch.selectionkeyi...@5e16a7e3
>> > > > > > >>   2. java.io.IOException: Read error rc = -1
>> > > > > > >>    java.nio.DirectByteBuffer[pos=0 lim=4 cap=4]
>> > > > > > >>    3.         at
>> > > > > > >>
>> > > > >
>>  org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>> > > > > > >>   4.         at
>> > > > > > >>
>> > > >  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> Were all 5 quorum members participating?  Check by doing
>> "echo
>> > > > stat|nc
>> > > > > > >> QUORUM_HOST port" for each member of quorum.... the below
>> might
>> > > > help:
>> > > > > > >>
>> > > > > > >> members="1,2,3,4,5"
>> > > > > > >> while true; do date; for i in $members; do echo $i; echo stat
>> |
>> > nc
>> > > > $i
>> > > > > > >> 2181;
>> > > > > > >> done; sleep 10; done
>> > > > > > >>
>> > > > > > >> We need to improve our zk_dump.  It just reports quorum
>> > members...
>> > > > not
>> > > > > > if
>> > > > > > >> they are actually participating or not.  The above will tell
>> you
>> > > if
>> > > > > dead
>> > > > > > >> or
>> > > > > > >> alive.
>> > > > > > >>
>> > > > > > >> Please report back after checking above and following the
>> > > > Jean-Daniel
>> > > > > > >> prescription.
>> > > > > > >>
>> > > > > > >> Thanks,
>> > > > > > >> St.Ack
>> > > > > > >>
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Wed, Nov 11, 2009 at 12:29 PM, Zhenyu Zhong <
>> > > > > [email protected]
>> > > > > > >> >wrote:
>> > > > > > >>
>> > > > > > >> > Thanks a lot for the help.
>> > > > > > >> >
>> > > > > > >> > I understand that the change of disk configurations should
>> be
>> > > done
>> > > > > > >> > progressively such that HDFS can balance and recover from a
>> > disk
>> > > > > going
>> > > > > > >> > away.
>> > > > > > >> >
>> > > > > > >> > The data we are trying to import into HBase is about
>> > 200million
>> > > > > rows,
>> > > > > > >> each
>> > > > > > >> > row is about hundreds of bytes. Basically, we read in a
>> text
>> > > file
>> > > > > from
>> > > > > > >> > HDFS,
>> > > > > > >> > then  for each row in the text file, we fetch data from
>> HBase
>> > > > table,
>> > > > > > we
>> > > > > > >> > compare the column value and update the latest value back
>> into
>> > > > > HBase.
>> > > > > > I
>> > > > > > >> was
>> > > > > > >> > suggested to use CheckAndPut.  Anyway this is the common
>> > usecase
>> > > > on
>> > > > > my
>> > > > > > >> > side.
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > Many thanks
>> > > > > > >> > zhenyu
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> >
>> > > > > > >> > On Wed, Nov 11, 2009 at 3:03 PM, Jean-Daniel Cryans <
>> > > > > > >> [email protected]
>> > > > > > >> > >wrote:
>> > > > > > >> >
>> > > > > > >> > > 6 disks, 19 machines and you see those kind of issues?
>> I'm
>> > > quite
>> > > > > > >> > > surprised and I want to learn more about what you trying
>> to
>> > > to.
>> > > > > > >> > >
>> > > > > > >> > > But first, if hbase is managing ZK for you then you can
>> > change
>> > > > the
>> > > > > > >> > > value of hbase.zookeeper.property.dataDir so that it
>> points
>> > to
>> > > > its
>> > > > > > own
>> > > > > > >> > > mount. Make sure you also change dfs.data.dir in
>> > > > > conf/hdfs-site.xml
>> > > > > > >> > > (hadoop stuff) so that it stops pointing there (you
>> should
>> > > have
>> > > > 6
>> > > > > > >> > > values there at the moment if you configured it
>> correctly).
>> > > > Doing
>> > > > > > this
>> > > > > > >> > > could result in data loss if you do it all at once so do
>> the
>> > > > > change
>> > > > > > >> > > progressively. This also means that you may have to keep
>> a
>> > > > > seperate
>> > > > > > >> > > HDFS configuration for those particular nodes. Please ask
>> > more
>> > > > > > >> > > questions if it's not very clear.
>> > > > > > >> > >
>> > > > > > >> > > WRT to dedicating nodes to Zookeeeper, it makes sense
>> only
>> > if
>> > > > you
>> > > > > > have
>> > > > > > >> > > a lot of nodes but in your case, unless there's also CPU
>> > > > > contention,
>> > > > > > >> > > you're good with only a dedicated disk.
>> > > > > > >> > >
>> > > > > > >> > > Now for what you are trying to do, is this a massive
>> import
>> > > that
>> > > > > > goes
>> > > > > > >> > > over many hours? How big is each row? I'd love to get a
>> lot
>> > of
>> > > > > > details
>> > > > > > >> > > to help you optimize your cluster for your particular
>> > problem.
>> > > > > > >> > >
>> > > > > > >> > > Thx,
>> > > > > > >> > >
>> > > > > > >> > > J-D
>> > > > > > >> > >
>> > > > > > >> > > On Wed, Nov 11, 2009 at 11:51 AM, Zhenyu Zhong <
>> > > > > > >> [email protected]>
>> > > > > > >> > > wrote:
>> > > > > > >> > > > J-D,
>> > > > > > >> > > >
>> > > > > > >> > > > Thanks a lot for the insightful comments.
>> > > > > > >> > > > We are running 5 zookeeper quorums now. On these 5
>> quorum
>> > > > > servers,
>> > > > > > >> we
>> > > > > > >> > > also
>> > > > > > >> > > > run datanode, tasktracker and regionserver.
>> > > > > > >> > > > Now we have 6 disks per machine.
>> > > > > > >> > > > If it is the disk IO saturation, is there anyway to
>> > > configure
>> > > > > > >> zookeeper
>> > > > > > >> > > to
>> > > > > > >> > > > write to a seperate disk which is not the same one used
>> by
>> > > > HDFS?
>> > > > > > >> > > > Or shall we run zookeeper quorums on different nodes
>> other
>> > > > than
>> > > > > > >> HDFS?
>> > > > > > >> > > >
>> > > > > > >> > > > Are HLogs generated by Regionservers? Would it
>> alleviate
>> > the
>> > > > > issue
>> > > > > > >> if
>> > > > > > >> > we
>> > > > > > >> > > use
>> > > > > > >> > > > the first 14 for regionserver, and the rest of 5 for
>> > > > > zookeeper?(In
>> > > > > > >> that
>> > > > > > >> > > > sense, the node running zookeeper quorum won't run
>> > > > regionserver)
>> > > > > > >> > > >
>> > > > > > >> > > > I am very appreciated for your comments.
>> > > > > > >> > > >
>> > > > > > >> > > > Best,
>> > > > > > >> > > > zhenyu
>> > > > > > >> > > >
>> > > > > > >> > > >
>> > > > > > >> > > >
>> > > > > > >> > > > On Wed, Nov 11, 2009 at 2:30 PM, Jean-Daniel Cryans <
>> > > > > > >> > [email protected]
>> > > > > > >> > > >wrote:
>> > > > > > >> > > >
>> > > > > > >> > > >> The problem seems to be that the IO is saturated. When
>> > you
>> > > > see
>> > > > > > >> this:
>> > > > > > >> > > >>
>> > > > > > >> > > >> #
>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN
>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server
>> > > handler
>> > > > 9
>> > > > > > on
>> > > > > > >> > > >> 60021 took 6819ms appending an edit to hlog;
>> > editcount=370
>> > > > > > >> > > >> #
>> > > > > > >> > > >> 2009-11-11 04:28:10,308 WARN
>> > > > > > >> > > >> org.apache.hadoop.hbase.regionserver.HLog: IPC Server
>> > > handler
>> > > > 2
>> > > > > > on
>> > > > > > >> > > >> 60021 took 4412ms appending an edit to hlog;
>> > editcount=371
>> > > > > > >> > > >>
>> > > > > > >> > > >> It's a sign that HDFS is slow. I guess that the
>> zookeeper
>> > > > > quorum
>> > > > > > >> > > >> members are on those same nodes using the same disks?
>> If
>> > > so,
>> > > > > > >> there's
>> > > > > > >> > > >> the problem because Zookeeper needs to be able to
>> commit
>> > > > > > >> transactions
>> > > > > > >> > > >> very fast and if your disks are already hogged then
>> they
>> > > have
>> > > > > to
>> > > > > > >> wait
>> > > > > > >> > > >> and you can get disconnections. Do you still have 19
>> > quorum
>> > > > > > >> members?
>> > > > > > >> > > >> How many disks do you have per machine?
>> > > > > > >> > > >>
>> > > > > > >> > > >> Thx
>> > > > > > >> > > >>
>> > > > > > >> > > >> J-D
>> > > > > > >> > > >>
>> > > > > > >> > > >> On Wed, Nov 11, 2009 at 11:22 AM, Zhenyu Zhong <
>> > > > > > >> > [email protected]
>> > > > > > >> > > >
>> > > > > > >> > > >> wrote:
>> > > > > > >> > > >> > Please take a look at the following regionserver
>> log.
>> > > > > > >> > > >> > The error started to come out from 4:24am UTC to
>> 4:29am
>> > > > UTC.
>> > > > > > >> > > >> >
>> > > > > > >> > > >> > http://pastebin.com/m3717d064
>> > > > > > >> > > >> >
>> > > > > > >> > > >> > Thanks in advance.
>> > > > > > >> > > >> >
>> > > > > > >> > > >> > Best,
>> > > > > > >> > > >> > zhenyu
>> > > > > > >> > > >> >
>> > > > > > >> > > >> >
>> > > > > > >> > > >> > On Wed, Nov 11, 2009 at 2:01 PM, Jean-Daniel Cryans
>> <
>> > > > > > >> > > [email protected]
>> > > > > > >> > > >> >wrote:
>> > > > > > >> > > >> >
>> > > > > > >> > > >> >> Could you paste a bigger log trace (interested to
>> see
>> > at
>> > > > > least
>> > > > > > >> all
>> > > > > > >> > > 300
>> > > > > > >> > > >> >> lines before that) in a pastebin?
>> > > > > > >> > > >> >>
>> > > > > > >> > > >> >> Thx,
>> > > > > > >> > > >> >>
>> > > > > > >> > > >> >> J-D
>> > > > > > >> > > >> >>
>> > > > > > >> > > >> >> On Wed, Nov 11, 2009 at 10:56 AM, Zhenyu Zhong <
>> > > > > > >> > > [email protected]
>> > > > > > >> > > >> >
>> > > > > > >> > > >> >> wrote:
>> > > > > > >> > > >> >> > J-D,
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > Thanks for the reply.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > The problem I have is one or more of the
>> > regionservers
>> > > > > > >> disconnect
>> > > > > > >> > > from
>> > > > > > >> > > >> >> the
>> > > > > > >> > > >> >> > cluster.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > I checked the resource usage including memory
>> Heap
>> > > usage
>> > > > > for
>> > > > > > >> > > >> >> regionserver,
>> > > > > > >> > > >> >> > and found nothing suspicious. The regionserver
>> heap
>> > > > usage
>> > > > > is
>> > > > > > >> way
>> > > > > > >> > > below
>> > > > > > >> > > >> >> the
>> > > > > > >> > > >> >> > maxHeap defined in jvm.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > So I checked zookeeper log, then I found the
>> > following
>> > > > > > >> > > >> >> > BufferUnderflowException, eventually it leads to
>> > > > > > >> SocketException:
>> > > > > > >> > > >> >> Transport
>> > > > > > >> > > >> >> > endpoint is not connected.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > I also checked the regionserver log. It looks
>> like
>> > the
>> > > > > > >> > regionserver
>> > > > > > >> > > >> tried
>> > > > > > >> > > >> >> to
>> > > > > > >> > > >> >> > connect to other zookeeper quorum server. It
>> > connects
>> > > > > > >> > successfully,
>> > > > > > >> > > >> but
>> > > > > > >> > > >> >> then
>> > > > > > >> > > >> >> > it complained about SocketException: transport
>> > > endpoint
>> > > > is
>> > > > > > not
>> > > > > > >> > > >> connected.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > So both zookeeper log and regionserver log told
>> me
>> > > that
>> > > > > the
>> > > > > > >> > > transport
>> > > > > > >> > > >> >> > endpoint is not connected. I don't know why.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > I would be very appreciated if you could point me
>> to
>> > > the
>> > > > > > right
>> > > > > > >> > > >> direction.
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > Best,
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > Regionserver log:
>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > >> Attempting
>> > > > > > >> > > >> >> > connection to server superpyxis0018/
>> > > > 192.168.100.132:2181
>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > Priming
>> > > > > > >> > > >> >> > connection to
>> > > java.nio.channels.SocketChannel[connected
>> > > > > > >> local=/
>> > > > > > >> > > >> >> > 192.168.100.131:43065 remote=superpyxis0018/
>> > > > > > >> 192.168.100.132:2181
>> > > > > > >> > ]
>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,718 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > Server
>> > > > > > >> > > >> >> > connection successful
>> > > > > > >> > > >> >> > 2009-11-11 04:24:53,768 INFO
>> > > > > > >> > > >> >> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>> > > Got
>> > > > > > >> ZooKeeper
>> > > > > > >> > > >> event,
>> > > > > > >> > > >> >> > state: Disconnected, type: None, path: null
>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > >> Attempting
>> > > > > > >> > > >> >> > connection to server superpyxis0017/
>> > > > 192.168.100.131:2181
>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > Priming
>> > > > > > >> > > >> >> > connection to
>> > > java.nio.channels.SocketChannel[connected
>> > > > > > >> local=/
>> > > > > > >> > > >> >> > 192.168.100.131:46903 remote=superpyxis0017/
>> > > > > > >> 192.168.100.131:2181
>> > > > > > >> > ]
>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 INFO
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > Server
>> > > > > > >> > > >> >> > connection successful
>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > >> Exception
>> > > > > > >> > > >> >> > closing session 0x24dfd90c810002 to
>> > > > > > >> > > >> sun.nio.ch.selectionkeyi...@2dd78ccf
>> > > > > > >> > > >> >> > java.io.IOException: Read error rc = -1
>> > > > > > >> > > >> java.nio.DirectByteBuffer[pos=0
>> > > > > > >> > > >> >> > lim=4 cap=4]
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >> >> >
>> > > > > > >> > >
>> > > > >
>> org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >> >> >
>> > > > > > >> >
>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
>> > > > > > >> > > >> >> > 2009-11-11 04:24:54,118 WARN
>> > > > > > org.apache.zookeeper.ClientCnxn:
>> > > > > > >> > > Ignoring
>> > > > > > >> > > >> >> > exception during shutdown input
>> > > > > > >> > > >> >> > java.net.SocketException: Transport endpoint is
>> not
>> > > > > > connected
>> > > > > > >> > > >> >> >        at
>> > sun.nio.ch.SocketChannelImpl.shutdown(Native
>> > > > > > Method)
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >> >> >
>> > > > > > >> > >
>> > > > > >
>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >>
>> > > > sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >>
>> > > > > > >> >
>> > > > > >
>> > >
>> org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999)
>> > > > > > >> > > >> >> >        at
>> > > > > > >> > > >> >> >
>> > > > > > >> >
>> > > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970)
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> > On Wed, Nov 11, 2009 at 1:40 PM, Jean-Daniel
>> Cryans
>> > <
>> > > > > > >> > > >> [email protected]
>> > > > > > >> > > >> >> >wrote:
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >> I'm really not sure exactly what the situation
>> is
>> > > with
>> > > > > the
>> > > > > > >> > > >> >> >> java.nio.BufferUnderflowException, but what do
>> you
>> > > mean
>> > > > > by
>> > > > > > >> > > >> >> >> "regionserver disconnection"? Could you give
>> more
>> > > > > details?
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >> >> Thx,
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >> >> J-D
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >> >> On Wed, Nov 11, 2009 at 10:34 AM, Zhenyu Zhong <
>> > > > > > >> > > >> [email protected]
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >> >> wrote:
>> > > > > > >> > > >> >> >> > I searched around and found HBASE-1645 "HBase
>> > could
>> > > > be
>> > > > > a
>> > > > > > >> > little
>> > > > > > >> > > >> more
>> > > > > > >> > > >> >> >> dainty
>> > > > > > >> > > >> >> >> > when zk is pulled out from under it"
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >> > I am wondering whether my problem is related
>> to
>> > > this
>> > > > > > >> > HBase-1645.
>> > > > > > >> > > Or
>> > > > > > >> > > >> >> >> probably
>> > > > > > >> > > >> >> >> > has nothing to do with it?
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >> > thanks
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >> > On Wed, Nov 11, 2009 at 12:02 PM, Zhenyu Zhong
>> <
>> > > > > > >> > > >> >> [email protected]
>> > > > > > >> > > >> >> >> >wrote:
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >> >> Hi,
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >> I have been monitoring the system resources
>> such
>> > > as
>> > > > > > memory
>> > > > > > >> > Heap
>> > > > > > >> > > >> usage
>> > > > > > >> > > >> >> >> for
>> > > > > > >> > > >> >> >> >> Regionserver, load average, network
>> bandwidth,
>> > > etc.
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >> I experienced a disconnection last night.
>> When I
>> > > > > checked
>> > > > > > >> the
>> > > > > > >> > > >> memory
>> > > > > > >> > > >> >> Heap
>> > > > > > >> > > >> >> >> >> for Regionserver, it was 2336 which is way
>> below
>> > > the
>> > > > > max
>> > > > > > >> Heap
>> > > > > > >> > > >> 4096. I
>> > > > > > >> > > >> >> >> don't
>> > > > > > >> > > >> >> >> >> think it is Regionserver problem. The
>> requests
>> > > that
>> > > > > the
>> > > > > > >> > > >> regionserver
>> > > > > > >> > > >> >> got
>> > > > > > >> > > >> >> >> was
>> > > > > > >> > > >> >> >> >> about 126 which doesn't seem to be high. The
>> > load
>> > > > > > average
>> > > > > > >> is
>> > > > > > >> > > kind
>> > > > > > >> > > >> of
>> > > > > > >> > > >> >> >> high
>> > > > > > >> > > >> >> >> >> >100. (but I saw disconnections while the
>> load
>> > > > average
>> > > > > > is
>> > > > > > >> > > around
>> > > > > > >> > > >> 30).
>> > > > > > >> > > >> >> >> The
>> > > > > > >> > > >> >> >> >> bandwidth is not a problem.
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >> However, I found error messages from the
>> > > zookeeper.
>> > > > > > Please
>> > > > > > >> > see
>> > > > > > >> > > the
>> > > > > > >> > > >> >> >> >> following error messages. There are some
>> > > > > > >> > > >> >> >> >> BufferUnderflowException error, IOException
>> > > > error,etc.
>> > > > > > >> > > >> >> >> >> May I ask what might be the causes that
>> generate
>> > > > these
>> > > > > > >> error
>> > > > > > >> > > >> >> messages?
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >> Many thanks
>> > > > > > >> > > >> >> >> >> zhenyu
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>> > > > > > >> > > org.apache.zookeeper.server.Request:
>> > > > > > >> > > >> >> >> Ignoring
>> > > > > > >> > > >> >> >> >> exception during toString
>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >>
>> > > > > > >>
>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> > > > > > >> > > >> >> >> >>         at
>> > > > java.lang.String.valueOf(String.java:2827)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.quorum.CommitProcessor.processRequest(CommitProcessor.java:167)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:68)
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>> > > > > > >> > > >> >> >> >> Exception causing close of session
>> > > 0x424dfd908c50009
>> > > > > > >> > > >> >> >> >> due to java.io.IOException: Read error
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>> > > > > > >> > > >> >> >> >> IOException stack trace
>> > > > > > >> > > >> >> >> >> java.io.IOException: Read error
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > >
>> > > > > >
>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:372)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>> > > > > > >> > > >> >> >> >>
>> > > org.apache.zookeeper.server.quorum.CommitProcessor:
>> > > > > > >> > Processing
>> > > > > > >> > > >> >> request::
>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>> > > > > > >> > > org.apache.zookeeper.server.Request:
>> > > > > > >> > > >> >> >> Ignoring
>> > > > > > >> > > >> >> >> >> exception during toString
>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >>
>> > > > > > >>
>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> > > > > > >> > > >> >> >> >>         at
>> > > > java.lang.String.valueOf(String.java:2827)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > java.lang.StringBuilder.append(StringBuilder.java:115)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:74)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>> > > > > > >> > > >> >> >> >>
>> > org.apache.zookeeper.server.FinalRequestProcessor:
>> > > > > > >> Processing
>> > > > > > >> > > >> >> request::
>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>> > > > > > >> > > org.apache.zookeeper.server.Request:
>> > > > > > >> > > >> >> >> Ignoring
>> > > > > > >> > > >> >> >> >> exception during toString
>> > > > > > >> > > >> >> >> >> java.nio.BufferUnderflowException
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > java.nio.ByteBuffer.get(ByteBuffer.java:675)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >>
>> > > > > > >>
>> org.apache.zookeeper.server.Request.toString(Request.java:199)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >>
>> > > > > > >>
>> > > >
>> org.apache.log4j.or.DefaultRenderer.doRender(DefaultRenderer.java:36)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > >
>> > > > org.apache.log4j.or.RendererMap.findAndRender(RendererMap.java:80)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.log4j.spi.LoggingEvent.getRenderedMessage(LoggingEvent.java:362)
>> > > > > > >> > > >> >> >> >> at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.log4j.helpers.PatternParser$BasicPatternConverter.convert(PatternParser.java:403)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >>
>> > > > > org.apache.log4j.PatternLayout.format(PatternLayout.java:502)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > >
>> > > > org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:302)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.log4j.DailyRollingFileAppender.subAppend(DailyRollingFileAppender.java:359)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >>
>> > > > > > >>
>> org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >>
>> > > > > > >>
>> > > >
>> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >>
>> > org.apache.log4j.Category.callAppenders(Category.java:206)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > org.apache.log4j.Category.forcedLog(Category.java:391)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> org.apache.log4j.Category.debug(Category.java:260)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:147)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 WARN
>> > > > > > >> > > org.apache.zookeeper.server.Request:
>> > > > > > >> > > >> >> >> Ignoring
>> > > > > > >> > > >> >> >> >> exception during toString
>> > > > > > >> > > >> >> >> >> sessionid:0x424dfd908c50009 type:setWatches
>> > > > > > >> > > >> cxid:0xfffffffffffffff8
>> > > > > > >> > > >> >> >> >> zxid:0xfffffffffffffffe txntype:unknown n/a
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 INFO
>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>> > > > > > >> > > >> >> >> >> closing session:0x424dfd908c50009
>> NIOServerCnxn:
>> > > > > > >> > > >> >> >> >> java.nio.channels.SocketChannel[connected
>> > local=/
>> > > > > > >> > > >> >> 192.168.100.131:2181
>> > > > > > >> > > >> >> >> remote=/
>> > > > > > >> > > >> >> >> >> 192.168.100.131:35500]
>> > > > > > >> > > >> >> >> >> 2009-11-11 04:29:00,810 DEBUG
>> > > > > > >> > > >> >> org.apache.zookeeper.server.NIOServerCnxn:
>> > > > > > >> > > >> >> >> >> ignoring exception during input shutdown
>> > > > > > >> > > >> >> >> >> java.net.SocketException: Transport endpoint
>> is
>> > > not
>> > > > > > >> connected
>> > > > > > >> > > >> >> >> >>         at
>> > > > > sun.nio.ch.SocketChannelImpl.shutdown(Native
>> > > > > > >> > Method)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > >
>> > > > > >
>> > > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >>
>> > > > > > >>
>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > >
>> > > > > > >>
>> > > > >
>> > org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:848)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > >
>> > > > > >
>> > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:498)
>> > > > > > >> > > >> >> >> >>         at
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >>
>> > > > > > >> > > >>
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:221)
>> > > > > > >> > > >> >> >> >>
>> > > > > > >> > > >> >> >> >
>> > > > > > >> > > >> >> >>
>> > > > > > >> > > >> >> >
>> > > > > > >> > > >> >>
>> > > > > > >> > > >> >
>> > > > > > >> > > >>
>> > > > > > >> > > >
>> > > > > > >> > >
>> > > > > > >> >
>> > > > > > >>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: regionserver disconnection

Reply via email to