Re: regionserver disconnection

stack Fri, 13 Nov 2009 15:02:55 -0800

Does it say

ulimit -n 32768


...as the second line in your log file on start of the master?

Is the stack trace that complains about too many open files same as the one
in the blog post J-D put up?

St.Ack


On Fri, Nov 13, 2009 at 1:37 PM, Zhenyu Zhong <[email protected]>wrote:

> The ulimit file descriptors was set to fs.file-max =1578334, also in the
> limits.conf the value was set to 32768.
> So these are way higher than the open descriptors for the running
> processes.
>
> thanks
> zhenyu
>
>
> On Fri, Nov 13, 2009 at 4:33 PM, Stack <[email protected]> wrote:
>
> > You upped the ulimit file descriptors as per the getting started doc?
> >
> >
> >
> > On Nov 13, 2009, at 1:26 PM, Zhenyu Zhong <[email protected]>
> wrote:
> >
> >  Thanks a lot.
> >>
> >>
> >> Bad news is my kernel is still 2.6.26.
> >> But it was not a problem before.
> >>
> >> Very strange.
> >>
> >> zhenyu
> >>
> >> On Fri, Nov 13, 2009 at 4:16 PM, Jean-Daniel Cryans <
> [email protected]
> >> >wrote:
> >>
> >>  Looks like
> >>>
> >>>
> http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
> >>>
> >>> J-D
> >>>
> >>> On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>>
> >>>> After I re-organize the cluster, the experiment ran into problem
> faster
> >>>>
> >>> than
> >>>
> >>>> before.
> >>>>
> >>>> Basically the changes are to use machines with less resources as
> >>>>
> >>> zookeeper
> >>>
> >>>> quorums and machines with more resources as regionserver.
> >>>>
> >>>> From the log, I still see the pause around 1 minute.
> >>>> I enabled the GC logging,  please see
> >>>>
> >>>> http://pastebin.com/m1d4ce0f1
> >>>>
> >>>> for details.
> >>>> In general I don't see many pauses in the GC.
> >>>>
> >>>> What is more interesting, 1 hour after the 1st regionserver
> >>>> disconnected,
> >>>> the master log complained about too many open files. This didn't
> happen
> >>>> before.
> >>>> I checked the system OS setup as well as the limits.conf. I also
> checked
> >>>>
> >>> the
> >>>
> >>>> running processes. The total open files don't reach the limit. I am
> >>>>
> >>> confused
> >>>
> >>>> a bit.
> >>>>
> >>>>
> >>>> Please see the following master log.
> >>>>
> >>>> 2009-11-13 20:06:02,114 INFO
> org.apache.hadoop.hbase.master.BaseScanner:
> >>>> RegionManager.metaScanner scan of 4658 row(s) of meta region {server:
> >>>> 192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete
> >>>> 2009-11-13 20:06:02,114 INFO
> org.apache.hadoop.hbase.master.BaseScanner:
> >>>>
> >>> All
> >>>
> >>>> 1 .META. region(s) scanned
> >>>> 2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x424eebf1c10004c after 3ms
> >>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.io.IOException: Bad connect ack with
> >>>> firstBadLink 192.168.100.123:50010
> >>>> 2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_-2808245019291145247_5478039
> >>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.io.EOFException
> >>>> 2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_1074853606841896259_5478048
> >>>> 2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x24eebf1043003c after 1ms
> >>>> 2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x424eebf1c10004c after 0ms
> >>>> 2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x24eebf1043003c after 0ms
> >>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.io.IOException: Bad connect ack with
> >>>> firstBadLink 192.168.100.123:50010
> >>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_-6642544517082142289_5478063
> >>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_2057511041109796090_5478063
> >>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_8219260302213892894_5478064
> >>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_1669205542853067709_5478235
> >>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_9128897691346270351_5478237
> >>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_1116845144864123018_5478240
> >>>> 2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x424eebf1c10004c after 0ms
> >>>> 2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got
> ping
> >>>> response for sessionid:0x24eebf1043003c after 0ms
> >>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient:
> >>>>
> >>> DataStreamer
> >>>
> >>>> Exception: java.io.IOException: Too many open files
> >>>>      at sun.nio.ch.IOUtil.initPipe(Native Method)
> >>>>      at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:49)
> >>>>      at
> >>>>
> >>>>
> sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
> >>>
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get(SocketIOWithTimeout.java:407)
> >>>
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:322)
> >>>
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> >>>
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> >>>
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> >>>
> >>>>      at
> >>>>
> >>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
> >>>
> >>>>      at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >>>>      at
> >>>>
> >>>>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2290)
> >>>
> >>>>
> >>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>>> Recovery for block blk_8148813491785406356_5478475 bad datanode[0]
> >>>> 192.168.100.123:50010
> >>>> 2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
> >>>> Recovery for block blk_8148813491785406356_5478475 in pipeline
> >>>> 192.168.100.123:50010, 192.168.100.134:50010, 192.168.100.122:50010:
> >>>> bad
> >>>> datanode 192.168.100.123:50010
> >>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception
> >>>>
> >>> in
> >>>
> >>>> createBlockOutputStream java.net.SocketException: Too many open files
> >>>> 2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:
> >>>> Abandoning
> >>>> block blk_2119727700857186236_5478498
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <
> [email protected]
> >>>> wrote:
> >>>>
> >>>>  Will do.
> >>>>>
> >>>>> thanks
> >>>>> zhenyu
> >>>>>
> >>>>>
> >>>>> On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote:
> >>>>>
> >>>>>  Enable GC logging too on this next run (see hbase-env.sh).  Lets try
> >>>>>>
> >>>>> and
> >>>
> >>>> get
> >>>>>> to the bottom of whats going on.
> >>>>>> Thanks,
> >>>>>> St.Ack
> >>>>>>
> >>>>>> On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <
> >>>>>>
> >>>>> [email protected]
> >>>
> >>>> wrote:
> >>>>>>>
> >>>>>>
> >>>>>>  I can switch the boxes that run zookeeper with the ones that run
> >>>>>>> regionservers.
> >>>>>>> I will see the results later.
> >>>>>>>
> >>>>>>> FYI. The node does have the 10 minutes zookeeper.session.timeout
> >>>>>>>
> >>>>>> value
> >>>
> >>>> in
> >>>>>>
> >>>>>>> place.
> >>>>>>>
> >>>>>>> thanks
> >>>>>>> zhenyu
> >>>>>>>
> >>>>>>> On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote:
> >>>>>>>
> >>>>>>>  On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong <
> >>>>>>>>
> >>>>>>> [email protected]
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> In my cluster, half of the cluster have 2 disks 400GB each per
> >>>>>>>>>
> >>>>>>>> machine,
> >>>>>>
> >>>>>>> and
> >>>>>>>>
> >>>>>>>>> half of the cluster have 6 disks per machine.  Maybe we should
> >>>>>>>>>
> >>>>>>>> run
> >>>
> >>>> zookeeper
> >>>>>>>>> on the machines with 2 disks and RS on machines with 6 disks?
> >>>>>>>>>
> >>>>>>>>> That would make most sense only in the below, it looks like the
> >>>>>>>>>
> >>>>>>>> RS
> >>>
> >>>> that
> >>>>>>
> >>>>>>> had
> >>>>>>>> issue had 4 disks?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>  BTW, the 10 minutes zookeeper.session.timeout has been set during
> >>>>>>>>>
> >>>>>>>> the
> >>>>>>
> >>>>>>> experiment.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> And for sure this node had it in place?
> >>>>>>>> St.Ack
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> thanks
> >>>>>>>>> zhenyu
> >>>>>>>>>
> >>>>>>>>> On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>  On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong <
> >>>>>>>>>>
> >>>>>>>>> [email protected]
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  Though I experienced 2 regionserver disconnection this
> >>>>>>>>>>>
> >>>>>>>>>> morning,
> >>>
> >>>> it
> >>>>>>
> >>>>>>> looks
> >>>>>>>>>
> >>>>>>>>>> better from the regionserver log. (please see the following
> >>>>>>>>>>>
> >>>>>>>>>> log)
> >>>
> >>>>
> >>>>>>>>>>>
> >>>>>>>>>>> http://pastebin.com/m496dbfae
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I checked diskIO for one of the regionserver(192.168.100.116)
> >>>>>>>>>>>
> >>>>>>>>>> during
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>> time it disconnected.
> >>>>>>>>>>>
> >>>>>>>>>>> Time: 03:04:58 AM
> >>>>>>>>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read
> >>>>>>>>>>>
> >>>>>>>>>> Blk_wrtn
> >>>>>>>>
> >>>>>>>>> sda             105.31      5458.83     19503.64 9043873239
> >>>>>>>>>>>
> >>>>>>>>>> 32312473676
> >>>>>>>>
> >>>>>>>>> sda1              2.90         6.64        99.25   10993934
> >>>>>>>>>>>
> >>>>>>>>>> 164433464
> >>>>>>>>
> >>>>>>>>> sda2              1.72        23.76        30.25   39365817
> >>>>>>>>>>>
> >>>>>>>>>> 50124008
> >>>>>>>>
> >>>>>>>>> sda3              0.30         0.38         3.58     624930
> >>>>>>>>>>>
> >>>>>>>>>> 5923000
> >>>>>>>>
> >>>>>>>>> sda4            100.39      5428.06     19370.56 8992888270
> >>>>>>>>>>>
> >>>>>>>>>> 32091993204
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  Is this high for you?  20k blocks/second would seem to be high
> >>>>>>>>>>
> >>>>>>>>> but
> >>>
> >>>> its
> >>>>>>>
> >>>>>>>> one
> >>>>>>>>>
> >>>>>>>>>> disk only and its not being shared by zk anymore so shouldn't
> >>>>>>>>>>
> >>>>>>>>> matter?
> >>>>>>
> >>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  I also checked the zookeeper quorum server that the
> >>>>>>>>>>>
> >>>>>>>>>> regionserver
> >>>
> >>>> tried
> >>>>>>>>
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>>> connect according to the log. However, I don't see
> >>>>>>>>>>>
> >>>>>>>>>> 192.168.100.116
> >>>>>>
> >>>>>>> in
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>> client list of the zookeeper quorum server that the
> >>>>>>>>>>>
> >>>>>>>>>> regionserver
> >>>
> >>>> tried
> >>>>>>>>
> >>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>>> connect.
> >>>>>>>>>>> Would that be a problem?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  Is that because the ephemeral node for the regionserver had
> >>>>>>>>>>
> >>>>>>>>> evaporated?
> >>>>>>>
> >>>>>>>> Lost
> >>>>>>>>>> its lease w/ zk by the time you went to look?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  Thu Nov 12 15:04:52 UTC 2009
> >>>>>>>>>>> Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48
> >>>>>>>>>>>
> >>>>>>>>>> GMT
> >>>
> >>>> Clients:
> >>>>>>>>>>> /192.168.100.127:43045[1](queued=0,recved=26,sent=0)
> >>>>>>>>>>> /192.168.100.131:39091[1](queued=0,recved=964,sent=0)
> >>>>>>>>>>> /192.168.100.124:35961[1](queued=0,recved=3266,sent=0)
> >>>>>>>>>>> /192.168.100.123:47935[1](queued=0,recved=233,sent=0)
> >>>>>>>>>>> /192.168.100.125:46931[1](queued=0,recved=2,sent=0)
> >>>>>>>>>>> /192.168.100.118:54924[1](queued=0,recved=295,sent=0)
> >>>>>>>>>>> /192.168.100.118:41390[1](queued=0,recved=2290,sent=0)
> >>>>>>>>>>> /192.168.100.136:42243[1](queued=0,recved=0,sent=0)
> >>>>>>>>>>>
> >>>>>>>>>>> Latency min/avg/max: 0/17/6333
> >>>>>>>>>>> Received: 47111
> >>>>>>>>>>> Sent: 0
> >>>>>>>>>>> Outstanding: 0
> >>>>>>>>>>> Zxid: 0x77000083f4
> >>>>>>>>>>> Mode: leader
> >>>>>>>>>>> Node count: 23
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> That 6 second maximum latency is pretty bad but should be well
> >>>>>>>>>>
> >>>>>>>>> within
> >>>>>>
> >>>>>>> the
> >>>>>>>>
> >>>>>>>>> zk
> >>>>>>>>>> session timeout.
> >>>>>>>>>>
> >>>>>>>>>> So, problem is likely on the zk client side of the session;
> >>>>>>>>>>
> >>>>>>>>> i.e.
> >>>
> >>>> in
> >>>>>>
> >>>>>>> the
> >>>>>>>
> >>>>>>>> RS.
> >>>>>>>>>
> >>>>>>>>>> You could enable GC logging as J-D suggested to see if you have
> >>>>>>>>>>
> >>>>>>>>> any
> >>>>>>
> >>>>>>> big
> >>>>>>>
> >>>>>>>> pauses, pauses > zk session timeout.
> >>>>>>>>>>
> >>>>>>>>>> When the RS went down, it didn't look too heavily loaded:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  1. 2009-11-12 15:04:52,830 INFO
> >>>>>>>>>>  org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
> >>>>>>>>>>
> >>>>>>>>> metrics:
> >>>>>>>
> >>>>>>>>  request=1.5166667, regions=322, stores=657, storefiles=631,
> >>>>>>>>>>  storefileIndexSize=61, memstoreSize=1472, usedHeap=2819,
> >>>>>>>>>>
> >>>>>>>>> maxHeap=4079,
> >>>>>>>>
> >>>>>>>>>  blockCacheSize=658110960, blockCacheFree=197395984,
> >>>>>>>>>>
> >>>>>>>>> blockCacheCount=9903,
> >>>>>>>>>
> >>>>>>>>>>  blockCacheHitRatio=99
> >>>>>>>>>>
> >>>>>>>>>> Its serving a few reads?  The number of store files seems fine.
> >>>>>>>>>>
> >>>>>>>>> Not
> >>>>>>
> >>>>>>> too
> >>>>>>>>
> >>>>>>>>> much memory used.
> >>>>>>>>>>
> >>>>>>>>>> Looking at the logs, I see the Lease Still Held exception.
> >>>>>>>>>>
> >>>>>>>>> This
> >>>
> >>>> happens
> >>>>>>>>
> >>>>>>>>> when the RS does its regular report to the master but the
> >>>>>>>>>>
> >>>>>>>>> master
> >>>
> >>>> thinks
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>> RS has since restarted.  It'll think this probably because it
> >>>>>>>>>>
> >>>>>>>>> noticed
> >>>>>>
> >>>>>>> that
> >>>>>>>>>
> >>>>>>>>>> the RS's znode in zk had gone away and it considered the RS
> >>>>>>>>>>
> >>>>>>>>> dead.
> >>>
> >>>>
> >>>>>>>>>> Looking too at your  logs I see this gap in the zk pinging:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>  1. 2009-11-12 15:03:39,325 DEBUG
> >>>>>>>>>>
> >>>>>>>>> org.apache.zookeeper.ClientCnxn:
> >>>>>>
> >>>>>>> Got
> >>>>>>>
> >>>>>>>>  ping response for sessionid:0x224e55436ad0004 after 0ms
> >>>>>>>>>>  2. 2009-11-12 15:03:43,113 DEBUG
> >>>>>>>>>>
> >>>>>>>>> org.apache.zookeeper.ClientCnxn:
> >>>>>>
> >>>>>>> Got
> >>>>>>>
> >>>>>>>>  ping response for sessionid:0x24e55436a0007d after 0ms
> >>>>>>>>>>
> >>>>>>>>>> Where in the lines above it, its reporting about every ten
> >>>>>>>>>>
> >>>>>>>>> seconds,
> >>>>>>
> >>>>>>> here
> >>>>>>>>
> >>>>>>>>> there is a big gap.
> >>>>>>>>>>
> >>>>>>>>>> Do you have ganglia or something that will let you look more
> >>>>>>>>>>
> >>>>>>>>> into
> >>>
> >>>> what
> >>>>>>>
> >>>>>>>> was
> >>>>>>>>>
> >>>>>>>>>> happening on this machine around this time?  Is the machine OK?
> >>>>>>>>>>
> >>>>>>>>> It
> >>>>>>
> >>>>>>> looks
> >>>>>>>>
> >>>>>>>>> lightly loaded and you have your cluster nicely laid out.
> >>>>>>>>>>
> >>>>>>>>> Something
> >>>>>>
> >>>>>>> odd
> >>>>>>>>
> >>>>>>>>> is
> >>>>>>>>>
> >>>>>>>>>> going on.  What about things like the write speed to disk?  In
> >>>>>>>>>>
> >>>>>>>>> the
> >>>
> >>>> past
> >>>>>>>
> >>>>>>>> strange issues have been explained by incorrectly set BIOS
> >>>>>>>>>>
> >>>>>>>>> which
> >>>
> >>>> made
> >>>>>>
> >>>>>>> disks
> >>>>>>>>>
> >>>>>>>>>> run at 1/100th of their proper speed.
> >>>>>>>>>>
> >>>>>>>>>> St.Ack
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> zhenyu
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong <
> >>>>>>>>>>>
> >>>>>>>>>> [email protected]
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>  Stack
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am very appreciated for your comments.
> >>>>>>>>>>>> I will use the zookeeper monitoring script on my cluster
> >>>>>>>>>>>>
> >>>>>>>>>>> and
> >>>
> >>>> let
> >>>>>>
> >>>>>>> it
> >>>>>>>
> >>>>>>>> run
> >>>>>>>>>
> >>>>>>>>>> overnight to see the result.
> >>>>>>>>>>>> I will follow up the thread when I get anything.
> >>>>>>>>>>>>
> >>>>>>>>>>>> thanks
> >>>>>>>>>>>> zhenyu
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]>
> >>>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>>>>>>  I see these in your log too:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>  1. 2009-11-11 04:27:19,018 DEBUG
> >>>>>>>>>>>>>
> >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:
> >>>>>>>>
> >>>>>>>>> Got
> >>>>>>>>>>
> >>>>>>>>>>>  ping response for sessionid:0x424dfd908c50009 after
> >>>>>>>>>>>>>
> >>>>>>>>>>>> 4544ms
> >>>
> >>>>  2. 2009-11-11 04:27:19,018 DEBUG
> >>>>>>>>>>>>>
> >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:
> >>>>>>>>
> >>>>>>>>> Got
> >>>>>>>>>>
> >>>>>>>>>>>  ping response for sessionid:0x24dfd90c810002 after
> >>>>>>>>>>>>>
> >>>>>>>>>>>> 4548ms
> >>>
> >>>>  3. 2009-11-11 04:27:43,960 DEBUG
> >>>>>>>>>>>>>
> >>>>>>>>>>>> org.apache.zookeeper.ClientCnxn:
> >>>>>>>>
> >>>>>>>>> Got
> >>>>>>>>>>
> >>>>>>>>>>>  ping response for sessionid:0x424dfd908c50009 after
> >>>>>>>>>>>>>
> >>>>>>>>>>>> 9030ms
> >>>
> >>>>  4. 2009-11-11 04:27:43,960 DEBUG
> >>>>>>>>>>>>>
> >>>>>>>>>>>> org.apache.zookeeper.
> >>>>>>>>
> >>>>>>>
>

Re: regionserver disconnection

Reply via email to