reduce need to rewrite fsimage on statrtup
------------------------------------------
Key: HDFS-1780
URL: https://issues.apache.org/jira/browse/HDFS-1780
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Daryn Sharp
On startup, the namenode will read the fs image, apply edits, then rewrite the
fs image. This requires a non-trivial amount of time for very large directory
structures. Perhaps the namenode should employ some logic to decide that the
edits are simple enough that it doesn't warrant rewriting the image back out to
disk.
A few ideas:
Use the size of the edit logs, if the size is below a threshold, assume it's
cheaper to reprocess the edit log instead of writing the image back out.
Time the processing of the edits and if the time is below a defined threshold,
the image isn't rewritten.
Timing the reading of the image, and the processing of the edits. Base the
decision on the time it would take to write the image (a multiplier is applied
to the read time?) versus the time it would take to reprocess the edits. If a
certain threshold (perhaps percentage or expected time to rewrite) is exceeded,
rewrite the image.
Somethingalong the lines of the last suggestion may allow for defaults that
adapt for any size cluster, thus eliminating the need to keep tweaking a
cluster's settings based on its size.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira