If I remember correctly, Having dfs.safemode.threshold.pct = 1 may lead to a problem that the Namenode is not leaving safemode because of floating point round off errors.
Having dfs.safemode.threshold.pct > 1 means that Namenode can never exit safemode since it is not achievable. Nicholas Sze ----- Original Message ---- > From: Raghu Angadi <[email protected]> > To: [email protected] > Sent: Tuesday, October 6, 2009 6:03:52 PM > Subject: Re: A question on dfs.safemode.threshold.pct > > I am not sure what the real concern is... You can set it to 1.0 (or even 1.1 > :)) if you prefer. Many admins do. > > Raghu. > > On Tue, Oct 6, 2009 at 5:20 PM, Manhee Jo wrote: > > > Thank you, Raghu. > > Then, when the percentage is below 0.999, how can you tell > > if some datanodes are just slower than others or some of the data blocks > > are lost? > > I think "percentage 1" should have speacial meaning like > > it guarantees integrity of data in HDFS. > > If it's below 1, then the integrity is not said to be guaranteed. > > > > Or are there any other useful means that a NameNode can fix the lost > > blocks, > > so that it doesn't care even 0.1% of data is lost? > > > > > > Thanks, > > Manhee > > > > ----- Original Message ----- From: "Raghu Angadi" > > To: > > Sent: Wednesday, October 07, 2009 1:26 AM > > Subject: Re: A question on dfs.safemode.threshold.pct > > > > > > > > Yes, it is mostly geared towards replication greater than 1. One of the > >> reasons for waiting for this threshold is to avoid HDFS starting > >> unnecessary > >> replications of blocks at the start up when some of the datanodes are > >> slower > >> to start up. > >> > >> When the replication is 1, you don't have that issue. A block either > >> exists > >> or does not. > >> > >> Raghu > >> 2009/10/5 Manhee Jo > >> > >> Hi all, > >>> > >>> Why isn't the dfs.safemode.threshold.pct 1 by default? > >>> When dfs.replication.min=1 with dfs.safemode.threshold.pct=0.999, > >>> there might be chances for a NameNode to check in with incomplete data > >>> in its file system. Am I right? Is it permissible? Or is it assuming that > >>> replication would be always more than 1? > >>> > >>> > >>> Thanks, > >>> Manhee > >>> > >> > >> > > > >
