[ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676910#action_12676910 ]
Konstantin Shvachko commented on HADOOP-4584: --------------------------------------------- As I said I propose to isolate in-memory block reports into a separate issue. Does anybody disagree with that? As for the heartbeat thread, I would like to propose an alternative to the approach and discuss pros and cons of the two. # Now we have a single thread (call it offerServer thread) which does all three operations: heartbeat with processing command returned from the name-node, blockReceived and blockReport. # Current Suresh's proposal is to separate heartbeats into a new thread (heartbeat thread), which also means creating a queue of commands returned from name-node for processing by the offerServer thread later on. # My proposal is to separate block report preparation into a new thread (blockReport thread), which wakes up once an hour and prepares a block report. Once the report is ready the offerService thread sends it to the name-node. I think the last proposal (3) may have an advantage over (2) because in (2) we still delay blockReceived and the processing of commands from the name-node until the block report is getting composed. > Slow generation of blockReport at DataNode causes delay of sending heartbeat > to NameNode > ---------------------------------------------------------------------------------------- > > Key: HADOOP-4584 > URL: https://issues.apache.org/jira/browse/HADOOP-4584 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: Hairong Kuang > Assignee: Suresh Srinivas > Fix For: 0.20.0 > > Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, > 4584.patch, 4584.patch > > > sometimes due to disk or some other problems, datanode takes minutes or tens > of minutes to generate a block report. It causes the datanode not able to > send heartbeat to NameNode every 3 seconds. In the worst case, it makes > NameNode to detect a lost heartbeat and wrongly decide that the datanode is > dead. > It would be nice to have two threads instead. One thread is for scanning data > directories and generating block report, and executes the requests sent by > NameNode; Another thread is for sending heartbeats, block reports, and > picking up the requests from NameNode. By having these two threads, the > sending of heartbeats will not get delayed by any slow block report or slow > execution of NameNode requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.