[
https://issues.apache.org/jira/browse/HDFS-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001979#comment-13001979
]
Uma Maheswara Rao G commented on HDFS-693:
------------------------------------------
In our observation this issue came in long run with huge no of blocks in Data
Nodes . every hour Data Nodes are sending their blocks report to the Name Node.
If number of blocks in Data Node are huge (3 Data Nodes with 2GB RAM, Scribe
server is sending logs at 5000records/s , 4 scribe clients , block size is 64MB
) then it requires good amount of time to scan all the blocks. This block
scanning causes lot of IO operations. At this time if any write request comes ,
then it will take long time for it to get a free io channel on the Data Node.
Because of this during the blcock scan time a Data Node may not be able to
acknowledge the client requests causing timeouts on the client sockets.
If DN1 send the data to DN2 for replication and at that time DN2 is doing the
block scanning. Since DN2 is busy, it may not be able to send the ack to DN1 on
time. So here timeouts can happen.
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for
> channel to be ready for write exceptions were cast when trying to read file
> via StreamFile.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-693
> URL: https://issues.apache.org/jira/browse/HDFS-693
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
> Affects Versions: 0.20.1
> Reporter: Yajun Dong
> Attachments: HDFS-693.log
>
>
> To exclude the case of network problem, I found the count of dataXceiver is
> about 30. Also, I could see the output of netstate -a | grep 50075 has many
> TIME_WAIT status when this happened.
> partial log in attachment.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira