For big-large clusters, it helps if the NN waits for N seconds after the threshold percentage being satisfied (minimum # of replicas of file's blocks being available) so that other DNs get some extra time to report in their blocks as well and help ease the initial client load the cluster receives. This is where the extension comes useful at (certainly tunable to a more suitable value).
For small clusters (single rack or so) you can probably make it 0 to shed off the extra wait. However, if you're ever working with NN recovery stuff (one reason the NN is down, due to), I recommend setting the threshold itself to > 1.1f to make sure the NN doesn't auto-exit safemode until you're sure that the new inode/block counts are alright and you haven't made any mistakes with the recovery process. You can then exit safemode manually when sure. In safemode, the NN does not issue block deletions, so data loss would not occur out of mistakes made (such as starting with an old copy of fsimage accidentally, etc.) On Fri, Sep 21, 2012 at 1:47 PM, Bertrand Dechoux <decho...@gmail.com> wrote: > Hi, > > I would like to know the relevance of dfs.safemode.extension. > Why would someone wait after leaving the safemode? > Why is it recommended not to set it to 0 instead of 30000 (30 seconds)? > > Regards > > Bertrand -- Harsh J