[ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689645#action_12689645 ]
Suresh Srinivas commented on HADOOP-4584: ----------------------------------------- Here is why holding FSDataset lock is not required: # When reconciling, the difference flagged by the scanner is only a hint to check the. The actual state of the block on disk and volumeMap is used to decide if indeed the difference still exists. This takes care of the following conditions: #* Scanner finds a block but it has been deleted after it was found #* Scanner does not find a block that has been added during scan # DirectoryScanner might not find differences for the blocks that got added or deleted, while it compares the block report from memory and the disk. These differences will be found in the next iteration of the scanner. I will add this information to the document as well. > Slow generation of blockReport at DataNode causes delay of sending heartbeat > to NameNode > ---------------------------------------------------------------------------------------- > > Key: HADOOP-4584 > URL: https://issues.apache.org/jira/browse/HADOOP-4584 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: Hairong Kuang > Assignee: Suresh Srinivas > Fix For: 0.20.0 > > Attachments: 4584.brthread.2.patch, 4584.brthread.3.patch, > 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, > 4584.brthread.3.patch, 4584.hbthread.patch, 4584.patch, 4584.patch, > 4584.patch, 4584.patch, 4584.patch, 4584.patch, Design.pdf > > > sometimes due to disk or some other problems, datanode takes minutes or tens > of minutes to generate a block report. It causes the datanode not able to > send heartbeat to NameNode every 3 seconds. In the worst case, it makes > NameNode to detect a lost heartbeat and wrongly decide that the datanode is > dead. > It would be nice to have two threads instead. One thread is for scanning data > directories and generating block report, and executes the requests sent by > NameNode; Another thread is for sending heartbeats, block reports, and > picking up the requests from NameNode. By having these two threads, the > sending of heartbeats will not get delayed by any slow block report or slow > execution of NameNode requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.