I have set dfs.datanode.max.xcievers=4096 and have swapping turned off, Regionserver Heap = 24 GB Datanode Heap = 1 GB On Fri, May 11, 2012 at 9:55 AM, sulabh choudhury <sula...@gmail.com> wrote:
> I have spent a lot of time trying to find a solution to this issue, but > have had no luck. I think this is because of the Hbase's read/write > pattern, but I do not see any related errors in hbase logs. > Does not look like it is because of a GC pause, but seeing several 480000 > ms timeOut certainly suggests something is really slowing down the *writes > *( I do see this only in the write ch.) > > In my dataNode logs I see tonnes of > 2012-05-11 09:34:30,953 WARN > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 10.10.2.102:50010, > storageID=DS-1494937024-10.10.2.102-50010-1305755343443, infoPort=50075, > ipcPort=50020):Got exception while serving > blk_-5331817573170456741_12784653 to /10.10.2.102: > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for *write*. ch : > java.nio.channels.SocketChannel[connected local=/10.10.2.102:50010remote=/ > 10.10.2.102:46752] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) > > 2012-05-11 09:34:30,953 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( > 10.10.2.102:50010, > storageID=DS-1494937024-10.10.2.102-50010-1305755343443, infoPort=50075, > ipcPort=50020):DataXceiver > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/10.10.2.102:50010remote=/ > 10.10.2.102:46752] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163) > > > This block is mapped to a Hbase region, from NN logs :- > > 2012-05-10 15:46:35,117 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /hbase/table1/5a84f3844b7fd049c73a78b78ba6c2cf/.tmp/1639371300072460962. > blk_4283960240517860151_12781124 > 2012-05-10 15:47:18,000 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.10.2.103:50010 is added > to blk_4283960240517860151_12781124 size 134217728 > 2012-05-10 15:47:18,000 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.10.2.102:50010 is added > to blk_4283960240517860151_12781124 size 134217728 > > > > I am running hbase-0.90.4-cdh3u3 on hadoop-0.20.2-cdh3u3 > -- -- Thanks and Regards, Sulabh Choudhury