Re: time outs when accessing port 50010

dave bayer Mon, 21 Dec 2009 11:55:58 -0800


On Nov 25, 2009, at 11:27 AM, David J. O'Dell wrote:

I've intermittently seen the following errors on both of myclusters, it happens when writing files.I was hoping this would go away with the new version but I see thesame behavior on both versions.The namenode logs don't show any problems, its always on the clientand datanodes.


[leaving errors below for reference]

I've seen similar errors on my 0.19.2 cluster when the cluster isdecently busy. I've traced this more or less to the host in questiondoing verification on its blocks, an operation which seems to take thedatanode out for upwards of 500 seconds in some cases.

In 0.19.2, if you look ato.a.h.hdfs.server.datanode.FSDataset.FSVolumeSet, you will see thatall methods are synchronized. All operations for the dataset on thenode seem to drop through methods in this class which in turn causes abackup when one thread spends a large amount of time locking themonitor...

You can grab a few jstacks and use a dump analyzer (like https://tda.dev.java.net/)to poke through them to see if you have the same behavior.

I have not spent enough time digging into this to understand whetherthe whole dataset really needs to be locked during the operation or ifthe locks could be moved closer to the FSDir operations.


dave bayer

original logs clips included here:

Client log:
09/11/25 10:54:15 INFO hdfs.DFSClient: Exception increateBlockOutputStream java.net.SocketTimeoutException: 69000millis timeout while waiting for channel to be ready for read. ch :java.nio.channels.SocketChannel[connected local=/10.1.75.11:37852remote=/10.1.75.125:50010]09/11/25 10:54:15 INFO hdfs.DFSClient: Abandoning blockblk_-105422935413230449_2260809/11/25 10:54:15 INFO hdfs.DFSClient: Waiting to find target node:10.1.75.125:50010
Datanode log:
2009-11-25 10:54:51,170 ERRORorg.apache.hadoop.hdfs.server.datanode.DataNode:DatanodeRegistration(10.1.75.125:50010,storageID=DS-1401408597-10.1.75.125-50010-1258737830230,infoPort=50075, ipcPort=50020):DataXceiverjava.net.SocketTimeoutException: 120000 millis timeout while waitingfor channel to be ready for connect. ch :java.nio.channels.SocketChannel[connection-pending remote=/10.1.75.104:50010]atorg.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:282)atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
      at java.lang.Thread.run(Thread.java:619)

Re: time outs when accessing port 50010

Reply via email to