[ 
https://issues.apache.org/jira/browse/HDFS-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987756#comment-15987756
 ] 

Kihwal Lee commented on HDFS-11714:
-----------------------------------

The original design pre-HA was to just create directory structure under the new 
directory. Then the inspector reports some directories are new. This causes the 
namesystem to call {{saveNamespace()}}, which unconditionally writes a VERSION 
in all storage directories.  This still happens for non-HA mode.

For HA, the fisrt part still happens, but does not do saveNamespace() 
automatically.
{noformat}
[main] INFO namenode.FSImage: Storage directory /xxx/hadoop/var/hdfs/namedir1 
is not formatted.
[main] INFO namenode.FSImage: Formatting ...
...
WARN namenode.FSImage: Storage directory Storage 
Directory/xxx/hadoop/var/hdfs/namedir1 contains no VERSION file. Skipping...
{noformat}
The last line is when a fsimage is searched and being loaded.

When a checkpoint is uploaded, the retention manager fails to delete old files 
in the directory.
{noformat}
INFO namenode.TransferFsImage: Downloaded file 
fsimage.ckpt_00000001234567890123 size 20000000000000 bytes.
INFO namenode.FSImageTransactionalStorageInspector: No version file in 
/xxx/hadoop/var/hdfs/namedir1
INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 
1234567890122
INFO namenode.NNStorageRetentionManager: Purging old image
{noformat}

> Newly added NN storage directory won't get initialized and cause space 
> exhaustion
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-11714
>                 URL: https://issues.apache.org/jira/browse/HDFS-11714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.3
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> When an empty namenode storage directory is detected on normal NN startup, it 
> may not be fully initialized. The new directory is still part of "in-service" 
> NNStrage and when a checkpoint image is uploaded, a copy will also be written 
> there.  However, the retention manager won't be able to purge old files since 
> it is lacking a VERSION file.  This causes fsimages to pile up in the 
> directory.  With a big name space, the disk will be filled in the order of 
> days or weeks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to