what about for 0.21 ? Also, where do you set this? in the data node configuration or namenode? It seems the default is set to "3 seconds".
On Tue, Mar 29, 2011 at 5:37 PM, Ravi Prakash <[email protected]>wrote: > I set these parameters for quickly discovering live / dead nodes. > > For 0.20 : heartbeat.recheck.interval > For 0.22 : dfs.namenode.heartbeat.recheck-interval dfs.heartbeat.interval > > Cheers, > Ravi > > > On 3/29/11 10:24 AM, "Michael Segel" <[email protected]> wrote: > > > > Rita, > > When the NameNode doesn't see a heartbeat for 10 minutes, it then > recognizes that the node is down. > > Per the Hadoop online documentation: > "Each DataNode sends a Heartbeat message to the NameNode periodically. A > network partition can cause a > subset of DataNodes to lose connectivity with the NameNode. The > NameNode detects this condition by the > absence of a Heartbeat message. The NameNode marks DataNodes > without recent Heartbeats as dead and > does not forward any new IO requests to them. Any data that was > registered to a dead DataNode is not available to HDFS any more. > DataNode death may cause the replication > factor of some blocks to fall below their specified value. The > NameNode constantly tracks which blocks need > to be replicated and initiates replication whenever necessary. The > necessity for re-replication may arise due > to many reasons: a DataNode may become unavailable, a replica may > become corrupted, a hard disk on a > DataNode may fail, or the replication factor of a file may be > increased. > " > > I was trying to find out if there's an hdfs-site parameter that could be > set to decrease this time period, but wasn't successful. > > HTH > > -Mike > > > ---------------------------------------- > > Date: Tue, 29 Mar 2011 08:13:43 -0400 > > Subject: live/dead node problem > > From: [email protected] > > To: [email protected] > > > > Hello All, > > > > Is there a parameter or procedure to check more aggressively for a > live/dead > > node? Despite me killing the hadoop process, I see the node active for > more > > than 10+ minutes in the "Live Nodes" page. Fortunately, the last contact > > increments. > > > > > > Using, branch-0.21, 0985326 > > > > -- > > --- Get your facts first, then you can distort them as you please.-- > > > -- --- Get your facts first, then you can distort them as you please.--
