[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

Uma Maheswara Rao G (JIRA) Tue, 16 Jul 2013 04:35:11 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709687#comment-13709687
 ]


Uma Maheswara Rao G commented on HDFS-4516:
-------------------------------------------

{code}
   } else {
+          // confirm would failed, it is possible that a failover happened, see
+          // HDFS-4516 for more
+          success = confirmAddBlock();
+        } 
+        
{code}
Problem with this code is, when persistBlock failed you will not abaondon the 
block and continue retry and createBlock again.

Another potential issue with this approach is, generation stamps can jump to 
higher than this block genstamp before we actually persist this block. Since 
other blocks higher genstamp number already reached to standby, it may try 
invlidate this block if DN reports to standby in HA mode.

I think we need to think better approach to fix this issue. 
I will attach a testcase to reproduce this issue.
                
> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4516
>                 URL: https://issues.apache.org/jira/browse/HDFS-4516
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0, 2.0.3-alpha
>            Reporter: Uma Maheswara Rao G
>            Priority: Critical
>         Attachments: HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

Reply via email to