Looks like
http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/
J-D
On Fri, Nov 13, 2009 at 1:12 PM, Zhenyu Zhong <[email protected]
>
wrote:
Hi,
After I re-organize the cluster, the experiment ran into problem
faster
than
before.
Basically the changes are to use machines with less resources as
zookeeper
quorums and machines with more resources as regionserver.
From the log, I still see the pause around 1 minute.
I enabled the GC logging, please see
http://pastebin.com/m1d4ce0f1
for details.
In general I don't see many pauses in the GC.
What is more interesting, 1 hour after the 1st regionserver
disconnected,
the master log complained about too many open files. This didn't
happen
before.
I checked the system OS setup as well as the limits.conf. I also
checked
the
running processes. The total open files don't reach the limit. I am
confused
a bit.
Please see the following master log.
2009-11-13 20:06:02,114 INFO
org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 4658 row(s) of meta region
{server:
192.168.100.128:60021, regionname: .META.,,1, startKey: <>} complete
2009-11-13 20:06:02,114 INFO
org.apache.hadoop.hbase.master.BaseScanner:
All
1 .META. region(s) scanned
2009-11-13 20:06:07,677 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x424eebf1c10004c after 3ms
2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.100.123:50010
2009-11-13 20:06:08,178 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_-2808245019291145247_5478039
2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.io.EOFException
2009-11-13 20:06:09,682 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_1074853606841896259_5478048
2009-11-13 20:06:10,334 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x24eebf1043003c after 1ms
2009-11-13 20:06:21,018 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x424eebf1c10004c after 0ms
2009-11-13 20:06:23,674 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x24eebf1043003c after 0ms
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.io.IOException: Bad connect ack with
firstBadLink 192.168.100.123:50010
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_-6642544517082142289_5478063
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:24,828 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_2057511041109796090_5478063
2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:24,928 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_8219260302213892894_5478064
2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:30,855 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_1669205542853067709_5478235
2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:30,905 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_9128897691346270351_5478237
2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:30,955 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_1116845144864123018_5478240
2009-11-13 20:06:34,372 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x424eebf1c10004c after 0ms
2009-11-13 20:06:37,034 DEBUG org.apache.zookeeper.ClientCnxn: Got
ping
response for sessionid:0x24eebf1043003c after 0ms
2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer
Exception: java.io.IOException: Too many open files
at sun.nio.ch.IOUtil.initPipe(Native Method)
at sun.nio.ch.EPollSelectorImpl.<init>
(EPollSelectorImpl.java:49)
at
sun.nio.ch.EPollSelectorProvider.openSelector
(EPollSelectorProvider.java:18)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.get
(SocketIOWithTimeout.java:407)
at
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select
(SocketIOWithTimeout.java:322)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO
(SocketIOWithTimeout.java:157)
at
org.apache.hadoop.net.SocketOutputStream.write
(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write
(SocketOutputStream.java:107)
at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run
(DFSClient.java:2290)
2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_8148813491785406356_5478475 bad datanode[0]
192.168.100.123:50010
2009-11-13 20:06:37,235 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_8148813491785406356_5478475 in pipeline
192.168.100.123:50010, 192.168.100.134:50010,
192.168.100.122:50010: bad
datanode 192.168.100.123:50010
2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:
Exception
in
createBlockOutputStream java.net.SocketException: Too many open
files
2009-11-13 20:06:37,436 INFO org.apache.hadoop.hdfs.DFSClient:
Abandoning
block blk_2119727700857186236_5478498
On Thu, Nov 12, 2009 at 4:21 PM, Zhenyu Zhong <[email protected]
wrote:
Will do.
thanks
zhenyu
On Thu, Nov 12, 2009 at 3:33 PM, stack <[email protected]> wrote:
Enable GC logging too on this next run (see hbase-env.sh). Lets
try
and
get
to the bottom of whats going on.
Thanks,
St.Ack
On Thu, Nov 12, 2009 at 12:29 PM, Zhenyu Zhong <
[email protected]
wrote:
I can switch the boxes that run zookeeper with the ones that run
regionservers.
I will see the results later.
FYI. The node does have the 10 minutes zookeeper.session.timeout
value
in
place.
thanks
zhenyu
On Thu, Nov 12, 2009 at 3:21 PM, stack <[email protected]> wrote:
On Thu, Nov 12, 2009 at 11:50 AM, Zhenyu Zhong <
[email protected]
wrote:
In my cluster, half of the cluster have 2 disks 400GB each per
machine,
and
half of the cluster have 6 disks per machine. Maybe we should
run
zookeeper
on the machines with 2 disks and RS on machines with 6 disks?
That would make most sense only in the below, it looks like the
RS
that
had
issue had 4 disks?
BTW, the 10 minutes zookeeper.session.timeout has been set
during
the
experiment.
And for sure this node had it in place?
St.Ack
thanks
zhenyu
On Thu, Nov 12, 2009 at 2:08 PM, stack <[email protected]>
wrote:
On Thu, Nov 12, 2009 at 8:40 AM, Zhenyu Zhong <
[email protected]
wrote:
Though I experienced 2 regionserver disconnection this
morning,
it
looks
better from the regionserver log. (please see the following
log)
http://pastebin.com/m496dbfae
I checked diskIO for one of the regionserver(192.168.100.116)
during
the
time it disconnected.
Time: 03:04:58 AM
Device: tps Blk_read/s Blk_wrtn/s Blk_read
Blk_wrtn
sda 105.31 5458.83 19503.64 9043873239
32312473676
sda1 2.90 6.64 99.25 10993934
164433464
sda2 1.72 23.76 30.25 39365817
50124008
sda3 0.30 0.38 3.58 624930
5923000
sda4 100.39 5428.06 19370.56 8992888270
32091993204
Is this high for you? 20k blocks/second would seem to be high
but
its
one
disk only and its not being shared by zk anymore so shouldn't
matter?
I also checked the zookeeper quorum server that the
regionserver
tried
to
connect according to the log. However, I don't see
192.168.100.116
in
the
client list of the zookeeper quorum server that the
regionserver
tried
to
connect.
Would that be a problem?
Is that because the ephemeral node for the regionserver had
evaporated?
Lost
its lease w/ zk by the time you went to look?
Thu Nov 12 15:04:52 UTC 2009
Zookeeper version: 3.2.1-808558, built on 08/27/2009 18:48
GMT
Clients:
/192.168.100.127:43045[1](queued=0,recved=26,sent=0)
/192.168.100.131:39091[1](queued=0,recved=964,sent=0)
/192.168.100.124:35961[1](queued=0,recved=3266,sent=0)
/192.168.100.123:47935[1](queued=0,recved=233,sent=0)
/192.168.100.125:46931[1](queued=0,recved=2,sent=0)
/192.168.100.118:54924[1](queued=0,recved=295,sent=0)
/192.168.100.118:41390[1](queued=0,recved=2290,sent=0)
/192.168.100.136:42243[1](queued=0,recved=0,sent=0)
Latency min/avg/max: 0/17/6333
Received: 47111
Sent: 0
Outstanding: 0
Zxid: 0x77000083f4
Mode: leader
Node count: 23
That 6 second maximum latency is pretty bad but should be well
within
the
zk
session timeout.
So, problem is likely on the zk client side of the session;
i.e.
in
the
RS.
You could enable GC logging as J-D suggested to see if you
have
any
big
pauses, pauses > zk session timeout.
When the RS went down, it didn't look too heavily loaded:
1. 2009-11-12 15:04:52,830 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of
metrics:
request=1.5166667, regions=322, stores=657, storefiles=631,
storefileIndexSize=61, memstoreSize=1472, usedHeap=2819,
maxHeap=4079,
blockCacheSize=658110960, blockCacheFree=197395984,
blockCacheCount=9903,
blockCacheHitRatio=99
Its serving a few reads? The number of store files seems
fine.
Not
too
much memory used.
Looking at the logs, I see the Lease Still Held exception.
This
happens
when the RS does its regular report to the master but the
master
thinks
the
RS has since restarted. It'll think this probably because it
noticed
that
the RS's znode in zk had gone away and it considered the RS
dead.
Looking too at your logs I see this gap in the zk pinging:
1. 2009-11-12 15:03:39,325 DEBUG
org.apache.zookeeper.ClientCnxn:
Got
ping response for sessionid:0x224e55436ad0004 after 0ms
2. 2009-11-12 15:03:43,113 DEBUG
org.apache.zookeeper.ClientCnxn:
Got
ping response for sessionid:0x24e55436a0007d after 0ms
Where in the lines above it, its reporting about every ten
seconds,
here
there is a big gap.
Do you have ganglia or something that will let you look more
into
what
was
happening on this machine around this time? Is the machine
OK?
It
looks
lightly loaded and you have your cluster nicely laid out.
Something
odd
is
going on. What about things like the write speed to disk? In
the
past
strange issues have been explained by incorrectly set BIOS
which
made
disks
run at 1/100th of their proper speed.
St.Ack
Best,
zhenyu
On Wed, Nov 11, 2009 at 3:58 PM, Zhenyu Zhong <
[email protected]
wrote:
Stack
I am very appreciated for your comments.
I will use the zookeeper monitoring script on my cluster
and
let
it
run
overnight to see the result.
I will follow up the thread when I get anything.
thanks
zhenyu
On Wed, Nov 11, 2009 at 3:52 PM, stack <[email protected]>
wrote:
I see these in your log too:
1. 2009-11-11 04:27:19,018 DEBUG
org.apache.zookeeper.ClientCnxn:
Got
ping response for sessionid:0x424dfd908c50009 after
4544ms
2. 2009-11-11 04:27:19,018 DEBUG
org.apache.zookeeper.ClientCnxn:
Got
ping response for sessionid:0x24dfd90c810002 after
4548ms
3. 2009-11-11 04:27:43,960 DEBUG
org.apache.zookeeper.ClientCnxn:
Got
ping response for sessionid:0x424dfd908c50009 after
9030ms
4. 2009-11-11 04:27:43,960 DEBUG
org.apache.zookeeper.