Name-node should demand a block report from resurrected data-nodes.
-------------------------------------------------------------------

                 Key: HADOOP-641
                 URL: http://issues.apache.org/jira/browse/HADOOP-641
             Project: Hadoop
          Issue Type: Bug
    Affects Versions: 0.7.2, 0.1.0
            Reporter: Konstantin Shvachko


1. This bug contributed to the crash discussed in HADOOP-572.
The problem is that when the name-node is busy, and is not able to process all 
requests from its clients,
it can consider one of data-nodes dead and discard its blocks sending them into 
the neededRelications list.
When it finally gets the heartbeat from this data-node it resurrects the node, 
but not the data-node blocks,
and hence continues to replicate them.
Of course, eventually the name-node will receive the block report from this 
data-node, but it could take up
to 1 hour. During this time it proceeds with unnecessary block replications, 
which could be avoided if the
data-node sent its block report right after the resurrection.

I modified code so that the name-node requests block report if it receives a 
heartbeat from a dead data-node.
I introduced a new command type in the BlockCommand class.
I replaced multiple boolean indicators of the command types by one enum field.
I changed the DatanodeProtocol version.

2. This patch also includes a fix for the data-node registration. If a 
data-nodes times out during registration
it silently exits, which is hard to notice with a large number of nodes. This 
patch places registration in a loop,
so that it could retry.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to