[jira] Commented: (HADOOP-3022) Fast Cluster Restart

Konstantin Shvachko (JIRA) Sun, 11 May 2008 10:10:18 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595920#action_12595920
 ]


Konstantin Shvachko commented on HADOOP-3022:
---------------------------------------------

After the two optimizations HADOOP-3364 and HADOOP-3369 the load time is 
improved by a factor of 2.
The biggest progress is achieved in saving image and block processing, each of 
which is almost 4 times faster.
- image saving is 4 times faster
- block processing is 4 times faster

The table below summarizes sizes and compares new and old time measurements.

|| ||value||vs||
|objects|10 mln||
|files & dirs| 4 mln||
|blocks| 6 mln||
|heap size| 3.275 GB||
|image size| 0.6 GB||
|edits size per day| 0.27 GB||
|# data-nodes| 500||
|blocks per node| 36,000||
|image load time| 111 sec| 132 sec| 
|edits load time| 75 sec| 84 sec| 
|image save time| 18 sec| 70 sec|
|block processing| 87 sec| 320 sec|
|total startup time| 291 sec = 5 min| 606 sec = 10 min| 

This leads to the optimized startup time of 5 minutes, out of which
|load fsimage| 38%|
|load edits| 26%|
|save new fsimage| 6%|
|process block reports| 30%|

I think more improvements can be made here especially in the loading part.
For edits log we should optimize ADD and CLOSE transactions as noted in 
HADOOP-3364.
For image loading it is probably block processing, but that needs to be 
evaluated.
Leaving this issue open for now.


> Fast Cluster Restart
> --------------------
>
>                 Key: HADOOP-3022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3022
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Robert Chansler
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>
> This item introduces a discussion of how to reduce the time necessary to 
> start a large cluster from tens of minutes to a handful of minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3022) Fast Cluster Restart

Reply via email to