[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Suresh Srinivas (JIRA) Mon, 09 Mar 2009 17:58:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680343#action_12680343
 ]


Suresh Srinivas commented on HADOOP-4584:
-----------------------------------------

Based on the discussions so far, here is a proposal:
# DataBlockScanner will be enhanced to periodically check to see if the blocks 
on the disk matches blocks in memory.
# Block list is compiled from disk and in-memory map. The two lists are 
compared to find the following inconsistencies:
## Block is in memory and not on the disk
## Block is on the disk and not in memory
## Block on the disk does not match the block in memory
# Reconciling differences is done one difference at a time. FSDataset lock is 
held to prevent further block changes and a check is done to ensure 
inconsistency found still exists (to account for changes that might have 
happened while checking the disk for block files):
## If a block file is missing on the disk, block is deleted in memory
## If a block metadata file is missing on the disk, block in memory is updated 
with generation stamp as zero (as done in block reports currently)
## If a block is missing in memory, then it is added to FSDataset
## If blocks do not match, the in-memory block is updated to reflect the block 
on the disk
## A block metafile that does not have corresponding block file will be deleted 
from the disk
# Block report will be generated from the in-memory data

> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Reply via email to