[ 
https://issues.apache.org/jira/browse/HADOOP-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-4679:
----------------------------------

    Attachment: diskError.patch

This patch changes DataNode.shouldRun to be false when a disk error is detected 
while receiving a block. It also sets a timeout of 10s on DataXceiverServer's 
server sokcet so the dataXceverServer is able to wake up periodically to check 
if it should continue to run or not.

> Datanode prints tons of log messages: Waiting for threadgroup to exit, active 
> theads is XX
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4679
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4679
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>         Attachments: diskError.patch
>
>
> When a data receiver thread sees a disk error, it immediately calls shutdown 
> to shutdown DataNode. But the shutdown method does not return before all data 
> receiver threads exit, which will never happen. Therefore the DataNode gets 
> into a dead/live lock state, emitting tons of log messages: Waiting for 
> threadgroup to exit, active threads is XX.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to