[ http://issues.apache.org/jira/browse/HADOOP-641?page=all ]

Konstantin Shvachko updated HADOOP-641:
---------------------------------------

    Status: Patch Available  (was: Open)

> Name-node should demand a block report from resurrected data-nodes.
> -------------------------------------------------------------------
>
>                 Key: HADOOP-641
>                 URL: http://issues.apache.org/jira/browse/HADOOP-641
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.7.2, 0.1.0
>            Reporter: Konstantin Shvachko
>         Attachments: ResurrectDN.patch
>
>
> 1. This bug contributed to the crash discussed in HADOOP-572.
> The problem is that when the name-node is busy, and is not able to process 
> all requests from its clients,
> it can consider one of data-nodes dead and discard its blocks sending them 
> into the neededRelications list.
> When it finally gets the heartbeat from this data-node it resurrects the 
> node, but not the data-node blocks,
> and hence continues to replicate them.
> Of course, eventually the name-node will receive the block report from this 
> data-node, but it could take up
> to 1 hour. During this time it proceeds with unnecessary block replications, 
> which could be avoided if the
> data-node sent its block report right after the resurrection.
> I modified code so that the name-node requests block report if it receives a 
> heartbeat from a dead data-node.
> I introduced a new command type in the BlockCommand class.
> I replaced multiple boolean indicators of the command types by one enum field.
> I changed the DatanodeProtocol version.
> 2. This patch also includes a fix for the data-node registration. If a 
> data-nodes times out during registration
> it silently exits, which is hard to notice with a large number of nodes. This 
> patch places registration in a loop,
> so that it could retry.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to