[ http://issues.apache.org/jira/browse/HADOOP-641?page=all ]
Konstantin Shvachko reassigned HADOOP-641: ------------------------------------------ Assignee: Konstantin Shvachko > Name-node should demand a block report from resurrected data-nodes. > ------------------------------------------------------------------- > > Key: HADOOP-641 > URL: http://issues.apache.org/jira/browse/HADOOP-641 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.7.2, 0.1.0 > Reporter: Konstantin Shvachko > Assigned To: Konstantin Shvachko > Attachments: ResurrectDN.patch > > > 1. This bug contributed to the crash discussed in HADOOP-572. > The problem is that when the name-node is busy, and is not able to process > all requests from its clients, > it can consider one of data-nodes dead and discard its blocks sending them > into the neededRelications list. > When it finally gets the heartbeat from this data-node it resurrects the > node, but not the data-node blocks, > and hence continues to replicate them. > Of course, eventually the name-node will receive the block report from this > data-node, but it could take up > to 1 hour. During this time it proceeds with unnecessary block replications, > which could be avoided if the > data-node sent its block report right after the resurrection. > I modified code so that the name-node requests block report if it receives a > heartbeat from a dead data-node. > I introduced a new command type in the BlockCommand class. > I replaced multiple boolean indicators of the command types by one enum field. > I changed the DatanodeProtocol version. > 2. This patch also includes a fix for the data-node registration. If a > data-nodes times out during registration > it silently exits, which is hard to notice with a large number of nodes. This > patch places registration in a loop, > so that it could retry. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira