[ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676363#action_12676363
 ] 

Raghu Angadi commented on HADOOP-4584:
--------------------------------------

> I have seen the 'rm -r' in practice, by the way .
A human error like this has a quick work around : to restart the datanode.

Improvements to block verification are required (mainly to catch more errors) 
and should be part of of block verifier, block report does not need to 
compensate for that.

The scan of directories once a day is fine might be more assuring to people. 
But it is very important not to hold the lock (as DN does now) even if the scan 
is less frequent. i.e., whether we scan once a day or once every block report, 
the implementation of the scan looks the same. So it still needs to be 
implemented ...


> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to