[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Vinay (JIRA) Mon, 04 Nov 2013 20:07:31 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813613#comment-13813613
 ]


Vinay commented on HDFS-5443:
-----------------------------

Scenario may be the following too
1. Namenode crashed with block having 0 locations + client crash
2. after restart snapshot taken
3. original file deleted
4. checkpoint happened
5. NN restarted again with latest fsimage

At this time underconstruction file in snapshot will be loaded as complete file 
with COMPLETE blocks. 
Since crashed block was not written to any DN, no report will come for this .
So NN will be in safemode forever.


Latest patch attached to HDFS-5428 will solve most of the problem related to 
openfiles deletion and rename with snapshot and checkpoint. Current issue also 
will be solved with that. Contains test for this too.
Please review HDFS-5428 

> Namenode can stuck in safemode on restart if it crashes just after addblock 
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5443
>                 URL: https://issues.apache.org/jira/browse/HDFS-5443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>    So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or 
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock 
> call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Reply via email to