[
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709687#comment-13709687
]
Uma Maheswara Rao G commented on HDFS-4516:
-------------------------------------------
{code}
} else {
+ // confirm would failed, it is possible that a failover happened, see
+ // HDFS-4516 for more
+ success = confirmAddBlock();
+ }
+
{code}
Problem with this code is, when persistBlock failed you will not abaondon the
block and continue retry and createBlock again.
Another potential issue with this approach is, generation stamps can jump to
higher than this block genstamp before we actually persist this block. Since
other blocks higher genstamp number already reached to standby, it may try
invlidate this block if DN reports to standby in HA mode.
I think we need to think better approach to fix this issue.
I will attach a testcase to reproduce this issue.
> Client crash after block allocation and NN switch before lease recovery for
> the same file can cause readers to fail forever
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Uma Maheswara Rao G
> Priority: Critical
> Attachments: HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs)
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira