[ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marco Nicosia updated HADOOP-1297: ---------------------------------- Please include this in 0.12.4 > datanode sending block reports to namenode once every second > ------------------------------------------------------------ > > Key: HADOOP-1297 > URL: https://issues.apache.org/jira/browse/HADOOP-1297 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Assigned To: dhruba borthakur > Fix For: 0.13.0 > > Attachments: datanodeDeleteBlocks2.patch > > > The namenode is requesting a block to be deleted. The datanode tries this > operation and encounters an error because the block is not in the blockMap. > The processCommand() method raises an exception. The code is such that the > variable lastBlockReport is not set if processCommand() raises an exception. > This means that the datanode immediately send another block report to the > namenode. The eats up quite a bit of CPU on namenode. > In short, the above condition causes the datanode to send blockReports almost > once every second! > I propose that we do the following: > 1. in Datanode.offerService, replace the following piece of code > DatanodeCommand cmd = namenode.blockReport(dnRegistration, > data.getBlockReport()); > processCommand(cmd); > lastBlockReport = now; > with > DatanodeCommand cmd = namenode.blockReport(dnRegistration, > data.getBlockReport()); > lastBlockReport = now; > processCommand(cmd); > 2. In FSDataSet.invalidate: > a) continue to process all blocks in invalidBlks[] even if one in the middle > encounters a problem. > b) if getFile() returns null, still invoke volumeMap.get() and print whether > we found the block in > volumes or not. The volumeMap is used to generate the blockReport and this > might help in debugging. > [ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.