[ 
https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010762#comment-13010762
 ] 

Matt Foley commented on HDFS-1780:
----------------------------------

>> Writing image is a (small) fraction of the other start up components. You 
>> can find the startup 
>> timeline numbers in other jiras. What is the high level problem you are 
>> solving?

With other improvements underway, large cluster startup time is down to about 
30 minutes.  Of this, 5 minutes is writing the new FSImage files, even after 
the improvements of HDFS-1071.  So this has become a significant, if not huge, 
part of the startup time.

>> For those who have been running hadoop for 4+ years, Namenode being able to 
>> write back updated 
>> fsimage back saved us during upgrades. Please don't remove this completely, 
>> make it optional.

Completely agree that backup copies of this info are vital.  However:

(1) Since the Edits files are also replicated, it is reasonable to think that 
having a matched set of FSImage & Edits is sufficient for this protection; it 
is not vital to have them compacted into an updated FSImage.  I believe the 
proposal is not to eliminate redundant backups, the proposal is simply to not 
view the compacting operation as something vital to do at startup time.

(2) Since most of us running production clusters use Checkpoint Namenodes to do 
the compacting operation (combining the FSImage + Edits => new FSImage, and 
writing out redundant copies of the new FSImage) in background, in order to 
keep the size of the Edits logs under control, it is even less important to do 
a compaction operation during startup.  In fact, it seems to me that only sites 
that do NOT use any sort of Checkpoint Namenode actually have any need to do 
compaction from the Primary Namenode, at startup or otherwise.

So I think Daryn's suggestion is worthwhile.  Doing the compaction at startup 
should still remain an option, for sites not using Checkpoint Namenodes.

> reduce need to rewrite fsimage on statrtup
> ------------------------------------------
>
>                 Key: HDFS-1780
>                 URL: https://issues.apache.org/jira/browse/HDFS-1780
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Daryn Sharp
>
> On startup, the namenode will read the fs image, apply edits, then rewrite 
> the fs image.  This requires a non-trivial amount of time for very large 
> directory structures.  Perhaps the namenode should employ some logic to 
> decide that the edits are simple enough that it doesn't warrant rewriting the 
> image back out to disk.
> A few ideas:
> Use the size of the edit logs, if the size is below a threshold, assume it's 
> cheaper to reprocess the edit log instead of writing the image back out.
> Time the processing of the edits and if the time is below a defined 
> threshold, the image isn't rewritten.
> Timing the reading of the image, and the processing of the edits.  Base the 
> decision on the time it would take to write the image (a multiplier is 
> applied to the read time?) versus the time it would take to reprocess the 
> edits.  If a certain threshold (perhaps percentage or expected time to 
> rewrite) is exceeded, rewrite the image.
> Somethingalong the lines of the last suggestion may allow for defaults that 
> adapt for any size cluster, thus eliminating the need to keep tweaking a 
> cluster's settings based on its size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to