[
https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881489#action_12881489
]
Dmytro Molkov commented on HDFS-1071:
-------------------------------------
Well, what I mean by the parent thread holding the lock is the following:
the saveNamespace method is synchronized in the FSNamesystem and currently
while holding this lock, the handler thread walks the tree N times and writes N
files, so in a way we assume that the tree is guarded from all the
modifications by the FSNamesystem lock.
The same is true for the patch, except in this case we are walking the tree by
N different threads. But operating under the same assumptions that while we are
holding the FSNamesystem lock the tree is not being modified, and the handler
thread is waiting for all worker threads to finish writing to their files
before returning from the section synchronized on FSNamesystem.
We just deployed this patch internally to our production cluster:
2010-06-22 10:12:59,714 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 11906663754 saved in 140 seconds.
2010-06-22 10:13:50,626 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 11906663754 saved in 191 seconds.
This saved us 140 seconds on the current image.
As far as both copies being on the same drive is concerned - I guess this patch
will not give much of an improvement.
However I am not sure there is much value in storing two copies of the image on
the same drive?
Please correct me if I am wrong, but I thought that multiple copies of the
image should theoretically be stored on different drives to help in case of
drive failure (or on a filer to protect against machine dying), and storing two
copies on the same drive only helps with file corruption (accidental deletion)
and that is a weak argument to have multiple copies on one physical drive?
I like your approach with one thread doing serialization and others doing
writes, but it seems like it is a lot more complicated than the one in this
patch.
Because I am simply executing one call in a new born thread, while with
serializer-writer approach there will be more implementation questions, like
what to do with multiple writers that consume their queues at different speeds.
You cannot grow the queue indefinitely, since the namenode will simply run out
of memory, on the other hand you might want to write things out to faster
consumers as quickly as possible.
And the main benefit I see is only doing serialization of a tree once, but
since we are holding the FSNamesystem lock at that time the NameNode doesn't do
much anyways, it is also not worse than what was in place before that
(serialization was taking place once per image location).
> savenamespace should write the fsimage to all configured fs.name.dir in
> parallel
> --------------------------------------------------------------------------------
>
> Key: HDFS-1071
> URL: https://issues.apache.org/jira/browse/HDFS-1071
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Reporter: dhruba borthakur
> Assignee: Dmytro Molkov
> Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch,
> HDFS-1071.5.patch, HDFS-1071.patch
>
>
> If you have a large number of files in HDFS, the fsimage file is very big.
> When the namenode restarts, it writes a copy of the fsimage to all
> directories configured in fs.name.dir. This takes a long time, especially if
> there are many directories in fs.name.dir. Make the NN write the fsimage to
> all these directories in parallel.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.