[
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808320#comment-13808320
]
Uma Maheswara Rao G commented on HDFS-5443:
-------------------------------------------
We are seeing couple of issues for the open files with snapshots in our testing
clusters. Some were already filed recently.
Here snapshotted under-construction files are getting loaded as INodeFile
instead of INodeFileUnderConstructionWIthSnapShot
Because of this blocktotal is not getting decremented for this files, so NN is
not exiting safemode.
HDFS-5283 was fixed as workaround to increment the safeblockcount of reported
block is associated to snapshotted file and it is unserconstruction. But here
special case is, block wa allocated only in Namenode but pipeline not
established to DNs Yet, so, physical block is not created in Datanodes. So, DN
will not report any blocks. But due to loading underconstruction snapshotted
file as INodeFile if original INodefile deleted, it is expecting this block
also to reported.
Couple of points to discuss in solving this issue.
Option 1) Why can't we save underconstuction boolan flag for all INode as true
or false.
So, while reading the INodes itself we can place
INodeFilesUnderConstruction in tree and we can increment the underconstruction
block count here and use it for decrements from blockTotal. Currently this
count is getting calculated as blockMap.blocktotal- [lease file block total] .
So, we need not depend on lease files.
Option 2) May be we have to deal snapshotted underconstruction files with some
special care here. If original file deleted with out finalizing etc, may leads
to have such files in snapshot as underconstuction but no track on them for
safeblock counts etc. May be we can collect the files which were deleted
originally but present in snapshot with underconstuction state and save them
and load them separately.
> Namenode can stuck in safemode on restart if it crashes just after addblock
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5443
> URL: https://issues.apache.org/jira/browse/HDFS-5443
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Uma Maheswara Rao G
> Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
> So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock
> call
> Now on restart of the Namenode, it will stuck in safemode.
--
This message was sent by Atlassian JIRA
(v6.1#6144)