I've tried to follow it the best I can. I already increased the ulimit
to 32768. This is what I now have in my hdfs-site.xml. Am I missing
anything?
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/media/sdb,/media/sdc,/media/sdd</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>
</configuration>
.
Todd Lipcon wrote:
Hi Jeff,
Have you followed the HDFS configuration guide from the HBase wiki?
You need to bump up the transceiver count and probably ulimit as well.
Looks like you already tuned to 2048 but isn't high enough if you're
still getting the "exceeds the limit" message.
The EOFs and Connection Reset messages are when DFS clients are
disconnecting prematurely from a client stream (probably due to
xceiver errors on other streams)
-Todd
On Fri, Jun 4, 2010 at 8:56 AM, jeff whiting <je...@qualtrics.com
<mailto:je...@qualtrics.com>> wrote:
I had my HRegionServers go down due to hdfs exception. In the
datanode logs I'm seeing a lot of different and varied exceptions.
I've increased the data xceiver count now but these other ones
don't make a lot of sense.
Among them are:
:2010-06-04 07:41:56,917 ERROR datanode.DataNode
(DataXceiver.java:run(131)) -
DatanodeRegistration(192.168.1.184:50010
<http://192.168.1.184:50010>,
storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
infoPort=50075, ipcPort=50020):DataXceiver
-java.io.EOFException
- at java.io.DataInputStream.readByte(DataInputStream.java:250)
- at
org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
- at
org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
- at org.apache.hadoop.io.Text.readString(Text.java:400)
- at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313)
- at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
- at java.lang.Thread.run(Thread.java:619)
:2010-06-04 08:49:56,389 ERROR datanode.DataNode
(DataXceiver.java:run(131)) -
DatanodeRegistration(192.168.1.184:50010
<http://192.168.1.184:50010>,
storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
infoPort=50075, ipcPort=50020):DataXceiver
-java.io.IOException: Connection reset by peer
- at sun.nio.ch.FileDispatcher.read0(Native Method)
- at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
- at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
- at sun.nio.ch.IOUtil.read(IOUtil.java:206)
- at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
- at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
- at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
:2010-06-04 05:36:54,840 ERROR datanode.DataNode
(DataXceiver.java:run(131)) -
DatanodeRegistration(192.168.1.184:50010
<http://192.168.1.184:50010>,
storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
infoPort=50075, ipcPort=50020):DataXceiver
-java.io.IOException: xceiverCount 2049 exceeds the limit of
concurrent xcievers 2047
- at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
- at java.lang.Thread.run(Thread.java:619)
:2010-06-04 05:36:48,848 ERROR datanode.DataNode
(DataXceiver.java:run(131)) -
DatanodeRegistration(192.168.1.184:50010
<http://192.168.1.184:50010>,
storageID=DS-1601700079-192.168.1.184-50010-1274208308658,
infoPort=50075, ipcPort=50020):DataXceiver
-java.net.SocketTimeoutException: 480000 millis timeout while
waiting for channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected
local=/192.168.1.184:50010 <http://192.168.1.184:50010>
remote=/192.168.1.184:55349 <http://192.168.1.184:55349>]
- at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
- at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
- at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
- at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
- at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
- at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
- at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
- at java.lang.Thread.run(Thread.java:619)
--
The EOFException is the most common one I get. I'm also unsure
how I would get a connection reset by peer when I'm connecting
locally. Why is the file prematurely ending? Any idea of what is
going on?
Thanks,
~Jeff
--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com <mailto:je...@qualtrics.com>
--
Todd Lipcon
Software Engineer, Cloudera
--
Jeff Whiting
Qualtrics Senior Software Engineer
je...@qualtrics.com