[
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284357#comment-14284357
]
Daryn Sharp commented on HDFS-7587:
-----------------------------------
{{verifyQuota}} is already invoked so the quota counts shouldn't go out of
sync. {{updateSpaceConsumed}} calls {{updateCount}}, which calls
{{verifyQuota}} prior to invoking {{unprotectedUpdateCount}}. The quotas
aren't going to change so it seems calling {{verifyQuota}} explicitly is wasted
processing time.
bq. Otherwise, the quote counts will be incorrect if there is an exception
thrown later on.
Do you have a scenario in mind? Ie. what is "later on"? Moving the file to UC
and associating the lease aren't going to throw checked exceptions. They might
throw a runtime exception. The NN has no concept of a transaction (no
rollback), so we're fully committed to finishing the op once we start updating
datastructures. In this patch, once the quota update is successful, we're
committed to moving the file to UC and assigning a lease. If we think those
final steps will throw, then we're in trouble because we can't rollback. Even
if that were to happen, an out of sync quota is better than a corrupted
in-memory state and edit logs caused by the NN throwing runtime exceptions that
don't cause an abort.
> Edit log corruption can happen if append fails with a quota violation
> ---------------------------------------------------------------------
>
> Key: HDFS-7587
> URL: https://issues.apache.org/jira/browse/HDFS-7587
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Kihwal Lee
> Assignee: Daryn Sharp
> Priority: Blocker
> Attachments: HDFS-7587.patch
>
>
> We have seen a standby namenode crashing due to edit log corruption. It was
> complaining that {{OP_CLOSE}} cannot be applied because the file is not
> under-construction.
> When a client was trying to append to the file, the remaining space quota was
> very small. This caused a failure in {{prepareFileForWrite()}}, but after the
> inode was already converted for writing and a lease added. Since these were
> not undone when the quota violation was detected, the file was left in
> under-construction with an active lease without edit logging {{OP_ADD}}.
> A subsequent {{append()}} eventually caused a lease recovery after the soft
> limit period. This resulted in {{commitBlockSynchronization()}}, which closed
> the file with {{OP_CLOSE}} being logged. Since there was no corresponding
> {{OP_ADD}}, edit replaying could not apply this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)