I had my HRegionServers go down due to hdfs exception. In the datanode logs I'm seeing a lot of different and varied exceptions. I've increased the data xceiver count now but these other ones don't make a lot of sense.
Among them are: :2010-06-04 07:41:56,917 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, ipcPort=50020):DataXceiver -java.io.EOFException - at java.io.DataInputStream.readByte(DataInputStream.java:250) - at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298) - at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319) - at org.apache.hadoop.io.Text.readString(Text.java:400) - at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:313) - at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) - at java.lang.Thread.run(Thread.java:619) :2010-06-04 08:49:56,389 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, ipcPort=50020):DataXceiver -java.io.IOException: Connection reset by peer - at sun.nio.ch.FileDispatcher.read0(Native Method) - at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) - at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) - at sun.nio.ch.IOUtil.read(IOUtil.java:206) - at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) - at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) - at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) :2010-06-04 05:36:54,840 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, ipcPort=50020):DataXceiver -java.io.IOException: xceiverCount 2049 exceeds the limit of concurrent xcievers 2047 - at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88) - at java.lang.Thread.run(Thread.java:619) :2010-06-04 05:36:48,848 ERROR datanode.DataNode (DataXceiver.java:run(131)) - DatanodeRegistration(192.168.1.184:50010, storageID=DS-1601700079-192.168.1.184-50010-1274208308658, infoPort=50075, ipcPort=50020):DataXceiver -java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.184:50010 remote=/192.168.1.184:55349] - at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) - at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) - at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) - at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313) - at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400) - at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180) - at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95) - at java.lang.Thread.run(Thread.java:619) -- The EOFException is the most common one I get. I'm also unsure how I would get a connection reset by peer when I'm connecting locally. Why is the file prematurely ending? Any idea of what is going on? Thanks, ~Jeff -- Jeff Whiting Qualtrics Senior Software Engineer je...@qualtrics.com