[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Jing Zhao (JIRA) Mon, 04 Nov 2013 23:01:14 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813700#comment-13813700
 ]


Jing Zhao commented on HDFS-5443:
---------------------------------

bq. for one level of directory and file structure this method works,if 
directory structure is large like

Thanks [~sathish.gurram]! I guess the issue is like this:
# If the file is already an INodeFileUnderConstructionWithSnapshot, the current 
code will finally call collectBlocksAndClear and remove the 0-sized block.
# If the file is just an INodeFileUC (but not INodeUCWithSnapshot), when we 
delete its parent directory or ancestral directory, the current code will do 
nothing and leave the 0-sized block there.

So I think we may first want to fix the above issue here. I.e., when we delete 
a file, we make sure the 0-sized block always gets deleted (unless it's a 
rename). I will write some unit test to verify this and create a separate jira 
if necessary.

> Namenode can stuck in safemode on restart if it crashes just after addblock 
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5443
>                 URL: https://issues.apache.org/jira/browse/HDFS-5443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>    So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or 
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock 
> call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Reply via email to