[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

Daryn Sharp (JIRA) Thu, 12 Mar 2015 13:25:09 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359318#comment-14359318
 ]


Daryn Sharp commented on HDFS-7587:
-----------------------------------

The patch doesn't apply because the logic is very different due to truncate and 
variable length blocks.

At first glance, the new code looks buggy.  It's sometimes billing quota, 
sometimes not, if the block exceeds the preferred size it appears you "earn" 
back quota.  I don't have the familiarity with all this new code to provide a 
timely patch.  Un-assigning myself.  [~jingzhao], want to take a look?

> Edit log corruption can happen if append fails with a quota violation
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7587
>                 URL: https://issues.apache.org/jira/browse/HDFS-7587
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>            Assignee: Daryn Sharp
>            Priority: Blocker
>         Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation

Reply via email to