[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284357#comment-14284357
 ] 

Daryn Sharp commented on HDFS-7587:
-----------------------------------

{{verifyQuota}} is already invoked so the quota counts shouldn't go out of 
sync.  {{updateSpaceConsumed}} calls {{updateCount}}, which calls 
{{verifyQuota}} prior to invoking {{unprotectedUpdateCount}}.  The quotas 
aren't going to change so it seems calling {{verifyQuota}} explicitly is wasted 
processing time.

bq.  Otherwise, the quote counts will be incorrect if there is an exception 
thrown later on.

Do you have a scenario in mind?  Ie. what is "later on"?  Moving the file to UC 
and associating the lease aren't going to throw checked exceptions.  They might 
throw a runtime exception.  The NN has no concept of a transaction (no 
rollback), so we're fully committed to finishing the op once we start updating 
datastructures.  In this patch, once the quota update is successful, we're 
committed to moving the file to UC and assigning a lease.  If we think those 
final steps will throw, then we're in trouble because we can't rollback.  Even 
if that were to happen, an out of sync quota is better than a corrupted 
in-memory state and edit logs caused by the NN throwing runtime exceptions that 
don't cause an abort.

> Edit log corruption can happen if append fails with a quota violation
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7587
>                 URL: https://issues.apache.org/jira/browse/HDFS-7587
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
>            Assignee: Daryn Sharp
>            Priority: Blocker
>         Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was 
> complaining that {{OP_CLOSE}} cannot be applied because the file is not 
> under-construction.
> When a client was trying to append to the file, the remaining space quota was 
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
> inode was already converted for writing and a lease added. Since these were 
> not undone when the quota violation was detected, the file was left in 
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft 
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
> the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
> {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to