[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Raghu Angadi (JIRA) Thu, 26 Feb 2009 08:55:24 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677059#action_12677059
 ]


Raghu Angadi commented on HADOOP-4584:
--------------------------------------

I don't see any advantage to (3), based on more details, it might not even be 
correct. The requirement is that BlockReport should have "exact snapshot" of 
blocks... i.e. no changes changes can happen to FSDataset from the time block 
report starts until it ends. Which thread does it does not matter. Processing 
commands and block report in one thread makes sense since those need to happen 
serially.

May be (3) still has some advantage : could you give a specific example that 
shows the advantage? 

Fixing the block reports properly (with a directory scan once a day or so), 
i.e. "Option 3" in a seperate jira is ok. But I would like to see that marked 
as blocker at least for 0.20 or 0.21. I for one am pretty tired of replying 
"oh, that is a known issue and we need fix" every time users complain about it. 
Some users even had a separate process to constant scan the directory tree to 
to keep the inode info in kernel memory.


> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Reply via email to