[ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808320#comment-13808320
 ] 

Uma Maheswara Rao G commented on HDFS-5443:
-------------------------------------------

We are seeing couple of issues for the open files with snapshots in our testing 
clusters. Some were already filed recently.

Here snapshotted under-construction files are getting loaded as INodeFile 
instead of INodeFileUnderConstructionWIthSnapShot
Because of this blocktotal  is not getting decremented for this files, so NN is 
not exiting safemode.
HDFS-5283 was fixed as workaround to increment the safeblockcount of reported 
block is associated to snapshotted file and it is unserconstruction. But here 
special case is, block wa allocated only in Namenode but pipeline not 
established to DNs Yet, so, physical block is not created in Datanodes. So, DN 
will not report any blocks. But due to loading underconstruction snapshotted 
file as INodeFile if original INodefile deleted, it is expecting this block 
also to reported.

Couple of points to discuss in solving this issue.

Option 1) Why can't we save underconstuction boolan flag for all INode as true 
or false.
    So, while reading the INodes itself we can place 
INodeFilesUnderConstruction in tree and we can increment the underconstruction 
block count here and use it for decrements from blockTotal. Currently this 
count is getting calculated as blockMap.blocktotal- [lease file block total] . 
So, we need not depend on lease files.

Option 2) May be we have to deal snapshotted underconstruction files with some 
special care here. If original file deleted with out finalizing etc, may leads 
to have such files in snapshot as underconstuction but no track on them  for 
safeblock counts etc. May be we can collect the files which were deleted 
originally but present in snapshot with underconstuction state and save them 
and load them separately.



> Namenode can stuck in safemode on restart if it crashes just after addblock 
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5443
>                 URL: https://issues.apache.org/jira/browse/HDFS-5443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>    So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or 
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock 
> call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to