Hi,
recently, we're seeing frequent STEs in our datanodes. We had prior fixed
this issue by upping the handler count max.xciever (note this is misspelled
in the code as well - so we're just being consistent).
We're using 0.19 with a couple of patches - none of which should affect any
of the areas in the stacktrace.
We've seen this before upping the limits on the xcievers - but these
settings seem very high already. We're running 102 nodes.
Any hints would be appreciated.
<property>
<name>dfs.datanode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>300</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2000</value>
</property>
2009-09-24 17:48:13,648 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.16.160.79:50010,
storageID=DS-1662533511-10.16.160.79-50010-1219665628349, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/10.16.160.79:50010 remote=/
10.16.134.78:34280]
at
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:185)
at
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:293)
at
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:387)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:179)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:94)
at java.lang.Thread.run(Thread.java:619)