[
https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859246#comment-16859246
]
star commented on HDFS-12914:
-----------------------------
[~hexiaoqiao], I also write a unit test for this issue, mostly similar to
yours. Pasted here just for ref.
Other than the test code, a piece of code changed. BlockManager#processReport
will throw IOException to indicate an invalid lease id. Client will get the
exception.
{code:java}
if (context != null) {
if (!blockReportLeaseManager.checkLease(node, startTime,
context.getLeaseId())) {
throw new IOException("Invalid block report lease id
'"+context.getLeaseId()+"'");
}
}{code}
{code:java}
@Test
public void testDelayedBlockReport() throws IOException{
FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem();
BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager());
Mockito.doAnswer(new Answer<Boolean>() {
@Override
public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable {
//sleep 1000 ms to delay processing of current report
Thread.sleep(1000);
return (Boolean)invocationOnMock.callRealMethod();
}
}).when(testBlockManager).processReport(
Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class),
Mockito.any(BlockListAsLongs.class),
Mockito.any(BlockReportContext.class));
namesystem.setBlockManagerForTesting(testBlockManager);
String bpid = namesystem.getBlockPoolId();
DataNode dn = cluster.getDataNodes().get(0);
DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid);
namesystem.readLock();
long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg);
namesystem.readUnlock();
Map<DatanodeStorage, BlockListAsLongs> report = cluster.getBlockReport(bpid,
0);
List<StorageBlockReport> reportList = new ArrayList<>();
for(Map.Entry<DatanodeStorage, BlockListAsLongs> en : report.entrySet()){
reportList.add(new StorageBlockReport(en.getKey(), en.getValue()));
}
//it will throw IOException if lease id is invalid
cluster.getNameNode().getRpcServer().blockReport(
dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}),
new BlockReportContext(1, 0, System.nanoTime(), leaseId, true));
}
{code}
> Block report leases cause missing blocks until next report
> ----------------------------------------------------------
>
> Key: HDFS-12914
> URL: https://issues.apache.org/jira/browse/HDFS-12914
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0, 2.9.2
> Reporter: Daryn Sharp
> Assignee: Santosh Marella
> Priority: Critical
> Attachments: HDFS-12914-branch-2.001.patch,
> HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch,
> HDFS-12914.006.patch
>
>
> {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for
> conditions such as "unknown datanode", "not in pending set", "lease has
> expired", wrong lease id, etc. Lease rejection does not throw an exception.
> It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and
> interpreted as {{noStaleStorages}}.
> A re-registering node whose FBR is rejected from an invalid lease becomes
> active with _no blocks_. A replication storm ensues possibly causing DNs to
> temporarily go dead (HDFS-12645), leading to more FBR lease rejections on
> re-registration. The cluster will have many "missing blocks" until the DNs
> next FBR is sent and/or forced.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]