[
https://issues.apache.org/jira/browse/HDFS-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987756#comment-15987756
]
Kihwal Lee commented on HDFS-11714:
-----------------------------------
The original design pre-HA was to just create directory structure under the new
directory. Then the inspector reports some directories are new. This causes the
namesystem to call {{saveNamespace()}}, which unconditionally writes a VERSION
in all storage directories. This still happens for non-HA mode.
For HA, the fisrt part still happens, but does not do saveNamespace()
automatically.
{noformat}
[main] INFO namenode.FSImage: Storage directory /xxx/hadoop/var/hdfs/namedir1
is not formatted.
[main] INFO namenode.FSImage: Formatting ...
...
WARN namenode.FSImage: Storage directory Storage
Directory/xxx/hadoop/var/hdfs/namedir1 contains no VERSION file. Skipping...
{noformat}
The last line is when a fsimage is searched and being loaded.
When a checkpoint is uploaded, the retention manager fails to delete old files
in the directory.
{noformat}
INFO namenode.TransferFsImage: Downloaded file
fsimage.ckpt_00000001234567890123 size 20000000000000 bytes.
INFO namenode.FSImageTransactionalStorageInspector: No version file in
/xxx/hadoop/var/hdfs/namedir1
INFO namenode.NNStorageRetentionManager: Going to retain 2 images with txid >=
1234567890122
INFO namenode.NNStorageRetentionManager: Purging old image
{noformat}
> Newly added NN storage directory won't get initialized and cause space
> exhaustion
> ---------------------------------------------------------------------------------
>
> Key: HDFS-11714
> URL: https://issues.apache.org/jira/browse/HDFS-11714
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.3
> Reporter: Kihwal Lee
> Priority: Critical
>
> When an empty namenode storage directory is detected on normal NN startup, it
> may not be fully initialized. The new directory is still part of "in-service"
> NNStrage and when a checkpoint image is uploaded, a copy will also be written
> there. However, the retention manager won't be able to purge old files since
> it is lacking a VERSION file. This causes fsimages to pile up in the
> directory. With a big name space, the disk will be filled in the order of
> days or weeks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]