[ 
https://issues.apache.org/jira/browse/HDFS-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924775#action_12924775
 ] 

Jakob Homan commented on HDFS-1071:
-----------------------------------

bq. Could you please verify this. If the images are the same I'm fine with the 
implementation.
In the patch, the {{FSNameSystem::saveNamespace()}} acquires the writelock 
before calling {{FSImage::saveNamespace(renewCheckpointTime)}}.  The writing is 
done in parallel and each of the writer threads is joined (in 
{{waitForThreads}}) before returning from the method, where the writeLock is 
surrendered.  So this should be safe

There are other calls to {{saveNamespace}} that should be considered, though.   
{{FSImage::saveNamespace(renewCheckpointTime)}} is called from several other 
locations: In {{FSDirectory::loadFSImage}}, which is called by FSNameSystem's 
constructors, by {{BackupStorage::saveCheckpoint()}}, by 
{{CheckpointStorage::doMerge()}}, and by {{FSImage::doImportCheckpoint}}.  
Assuming no new operations are coming in, which they shouldn't be, the 
checkpoint and backupnode calls are safe.  The others are as well, assuming 
we're in safemode.  Does this sound reasonable?

I believe this addresses Konstantin's concerns.

A couple nits with the current patch (6):
* Java's Collections documentation is pretty adamant about traversing 
synchronized collections with a lock on the collection 
(http://download.oracle.com/javase/6/docs/api/java/util/Collections.html#synchronizedList(java.util.List)),
 which isn't done currently in the patch in {{processIOErrors}} for the {{sds}} 
parameter.  This isn't necessary at the moment, as only one thread is 
guaranteed to be iterating, but it may be better to synchronize now to avoid 
problems in the future.
* The MiniDFSCluster constructors have been deprecated since this patch was 
generated.  It should be updated to use the new Builder.
 

> savenamespace should write the fsimage to all configured fs.name.dir in 
> parallel
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-1071
>                 URL: https://issues.apache.org/jira/browse/HDFS-1071
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-1071.2.patch, HDFS-1071.3.patch, HDFS-1071.4.patch, 
> HDFS-1071.5.patch, HDFS-1071.6.patch, HDFS-1071.patch
>
>
> If you have a large number of files in HDFS, the fsimage file is very big. 
> When the namenode restarts, it writes a copy of the fsimage to all 
> directories configured in fs.name.dir. This takes a long time, especially if 
> there are many directories in fs.name.dir. Make the NN write the fsimage to 
> all these directories in parallel.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to