[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Uma Maheswara Rao G (JIRA) Tue, 29 Oct 2013 18:20:08 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808646#comment-13808646
 ]


Uma Maheswara Rao G commented on HDFS-5443:
-------------------------------------------

{quote}
This happens too even when snapshot is not involved. But without capturing the 
underconstruction file in snapshot, this problem is hard to notice.
{quote}
Yes, No problem in case of normal files underconstruction, as we save them 
separately and reload them for building leases. So, this blocks will be 
decremented from blocks total for safeblok count. But incase of snapshotted 
fils underCOnstruction, they are loading as normal INodeFiles incase if we 
delete original parent directory of this snapshotted files. We don't have any 
tracking for them on reload.
 


{code}
This can increase memory usage. Underconstructed files are expected to be a 
very small portion in the namespace.
{code}
Yeah. For some Inodes this is already introduced like If Node is snapshot node. 
But saving for all will increase.

{quote}
When the snapshot is taken, the block allocated but size is zero (no matter 
it's because pipeline-not-created or sized-not-reported-yet). In this case, the 
zero sized block may not need to be recorded in the snapshot at all.
{quote}
clean up of this 0 size block is happening only when we delete original file. 
But if we delete original directory itself this is not happening.

When we delete directory:
{code}
 Map<INode, INode> priorDeleted = null;
    if (snapshot == null) { // delete the current directory
      recordModification(prior, null);
      // delete everything in created list
      DirectoryDiff lastDiff = diffs.getLast();
      if (lastDiff != null) {
        counts.add(lastDiff.diff.destroyCreatedList(this, collectedBlocks,
            removedINodes));
      }
    } 
{code}

But when we delete file directly, your proposed adjustment done here.
{code}
 if (snapshot == null) { // delete the current file
      recordModification(prior, null);
      isCurrentFileDeleted = true;
      Util.collectBlocksAndClear(this, collectedBlocks, removedINodes);
      return Quota.Counts.newInstance();
    }
{code}

> Namenode can stuck in safemode on restart if it crashes just after addblock 
> logsync and after taking snapshot for such file.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5443
>                 URL: https://issues.apache.org/jira/browse/HDFS-5443
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: snapshots
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: sathish
>
> This issue is reported by Prakash and Sathish.
> On looking into the issue following things are happening.
> .
> 1) Client added block at NN and just did logsync
>    So, NN has block ID persisted.
> 2)Before returning addblock response to client take a snapshot for root or 
> parent directories for that file
> 3) Delete parent directory for that file
> 4) Now crash the NN with out responding success to client for that addBlock 
> call
> Now on restart of the Namenode, it will stuck in safemode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HDFS-5443) Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file.

Reply via email to