>> Writing image is a (small) fraction of the other start up components. You 
>> can find the startup timeline 
>> numbers in other jiras. What is the high level problem you are solving?

With other improvements underway, large cluster startup time is down to about 
30 minutes.  Of this, 5 minutes is writing the new FSImage files, even after 
the improvements of HDFS-1071.  So this has become a significant, if not huge, 
part of the startup time.

>> For those who have been running hadoop for 4+ years, Namenode being able to 
>> write back updated 
>> fsimage back saved us during upgrades. Please don't remove this completely, 
>> make it optional.

Completely agree that backup copies of this info are vital.  However:

(1) Since the Edits files are also replicated, it is reasonable to think that 
having a matched set of FSImage & Edits is sufficient for this protection; it 
is not vital to have them compacted into an updated FSImage.  I believe the 
proposal is not to eliminate redundant backups, the proposal is simply to not 
view the compacting operation as something vital to do at startup time.

(2) Since most of us running production clusters use Checkpoint Namenodes to do 
the compacting operation (combining the FSImage + Edits => new FSImage, and 
writing out redundant copies of the new FSImage) in background, in order to 
keep the size of the Edits logs under control, it is even less important to do 
a compaction operation during startup.  In fact, it seems to me that only sites 
that do NOT use any sort of Checkpoint Namenode actually have any need to do 
compaction from the Primary Namenode, at startup or otherwise.

So I think Daryn's suggestion is worthwhile.  Doing the compaction at startup 
should still remain an option, for sites not using Checkpoint Namenodes.

Reply via email to