Xiaoqiao He created HDFS-15113:
----------------------------------
Summary: Missing IBR when NameNode restart if open processCommand
async feature
Key: HDFS-15113
URL: https://issues.apache.org/jira/browse/HDFS-15113
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Reporter: Xiaoqiao He
Assignee: Xiaoqiao He
Recently, I meet one case that NameNode missing block after restart which is
related with HDFS-14997.
a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode
when receive some RPC request from DataNode.
b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister async.
{code:java}
void reRegister() throws IOException {
if (shouldRun()) {
// re-retrieve namespace info to make sure that, if the NN
// was restarted, we still match its version (HDFS-2120)
NamespaceInfo nsInfo = retrieveNamespaceInfo();
// and re-register
register(nsInfo);
scheduler.scheduleHeartbeat();
// HDFS-9917,Standby NN IBR can be very huge if standby namenode is down
// for sometime.
if (state == HAServiceState.STANDBY || state == HAServiceState.OBSERVER) {
ibrManager.clearIBRs();
}
}
}
{code}
c. As we know, #register will trigger BR immediately.
d. because #reRegister run async, so we could not make sure which one run first
between send FBR and clear IBR. If clean IBR run first, it will be OK. But if
send FBR first then clear IBR, it will missing some blocks received between
these two time point until next FBR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]