[
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813613#comment-13813613
]
Vinay commented on HDFS-5443:
-----------------------------
Scenario may be the following too
1. Namenode crashed with block having 0 locations + client crash
2. after restart snapshot taken
3. original file deleted
4. checkpoint happened
5. NN restarted again with latest fsimage
At this time underconstruction file in snapshot will be loaded as complete file
with COMPLETE blocks.
Since crashed block was not written to any DN, no report will come for this .
So NN will be in safemode forever.
Latest patch attached to HDFS-5428 will solve most of the problem related to
openfiles deletion and rename with snapshot and checkpoint. Current issue also
will be solved with that. Contains test for this too.
Please review HDFS-5428
> Namenode can stuck in safemode on restart if it crashes just after addblock
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-5443
> URL: https://issues.apache.org/jira/browse/HDFS-5443
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 3.0.0, 2.2.0
> Reporter: Uma Maheswara Rao G
> Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
> So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock
> call
> Now on restart of the Namenode, it will stuck in safemode.
--
This message was sent by Atlassian JIRA
(v6.1#6144)