[ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060253#comment-15060253
 ] 

Kihwal Lee commented on HDFS-8999:
----------------------------------

Well, HDFS-1172 is in.  If NN has an ability to correctly fix the locations 
based on IBRs and eventually using FBRs, it will be okay to not wait for IBRs 
before completing the block.

As for availability, a datanode can crash right after finalizing a block but 
before sending the IBR. But it could also crash right after sending the IBR. 
Timing-wise, the two are not very different. I don't think waiting for the IBR 
adds much value in this regard.

For correctness, we need to think about the new "race" between 
addBlock()/complete() from clients and IBRs from datanodes. E.g. there are 
places that ignore inconsistency while a block is under construction. It could 
create other issues especially when recovery is involved.  NN might incorrectly 
mark a replica as corrupt or have more locations than committed and do not know 
which are valid (this sometimes happens today).

Conceptually, namenode knows exactly where the replicas of an 
under-construction blocks are. If any of them changes, the client is supposed 
to call updatePipeline().  So, closing without waiting for IBRs seems 
reasonable.  However, the difficulty arises because the locations can change 
quickly by replication and balancing. I.e. namenode cannot reliably reject 
bogus locations. It has to record all reported locations whether it thinks they 
are corrupt or not.  When recoverClose() is involved, it can be even more 
confusing, as multiple IBRs with a different gen stamp comes from the datanode. 
 Namenode currently doesn't deal with this very well. It needs to be properly 
addressed before or with this JIRA.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-8999
>                 URL: https://issues.apache.org/jira/browse/HDFS-8999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Jitendra Nath Pandey
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
>    # NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
>    # If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
>    # When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to