A heartBeat is also an RPC. When you pause Namenode for 30 sec the datanode's heartbeat thread just waits for 30 sec for its heartbeat RPC to return. Note that when you pause Namenode, the RPCs to it don't fail immediately. During this wait, DNs can perform other transactions like serving data to clients.

B. X. wrote:
On Wed, Aug 19, 2009 at 7:10 PM, Owen O'Malley <omal...@apache.org> wrote:

Thank you both for clearing it up.

I have another related question:  my understanding is that basic
heartbeat mechanism are used to keep different roles (namenode,
datanode, tasktracker etc) aware of each other, but I am not able to
observe this in the log.   For example, if I use the sigstop/sigcont
mechanism to stop the namenode jvm process for 30 seconds and then
continue, I don't observe any extra communications due to supposedly
missed heartbeat.  (I checked the dfs.heartbeat.interval is set to 3
seconds).  Rather, what I saw is all roles seem to stop in unison for
30 seconds (by the fact that no log events in the same time window).

I would appreciate some pointers on how heartbeats are used and configured.

-Bin

Reply via email to