[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

Walter Su (JIRA) Wed, 30 Dec 2015 08:40:54 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15075180#comment-15075180
 ]


Walter Su commented on HDFS-8999:
---------------------------------

Suppose client writes 3 files sequentially (files are related and must be write 
in order):
f0b0, f0b1, f0b2, f0b3, f1b0, f2b0
Then
adding f0b2 will wait f0b0 completed,
adding f0b3 will wait f0b1 completed,
adding f1b0 won't wait f0b3
adding f2b0 won't wait f1b0

Is it strange? If we gonna do this, does it mean {{addBlock(..)}} can apply the 
same change?
If block size is small or client writes lots of small files, we have lots of 
committed blocks. And, what's the meaning of "minRepl"? Why we need "committed" 
and "completed"? The whole point is to let client know the data is safe so it 
can continue.
I don't worry about safety since acked empty_last_packet means block files are 
flushed/closed and is safe in DNs, it's just not reported.
Agreed race condition mentioned by Konstantin Shvachko is possible.

> Namenode need not wait for {{blockReceived}} for the last block before 
> completing a file.
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-8999
>                 URL: https://issues.apache.org/jira/browse/HDFS-8999
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Jitendra Nath Pandey
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h8999_20151228.patch
>
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment 
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to 
> announce the replica is safe. Looking into the code, now we have
>    # NameNode knows the DataNodes involved when initially setting up the 
> writing pipeline
>    # If any DataNode fails during the writing, client bumps the GS and 
> finally reports all the DataNodes included in the new pipeline to NameNode 
> through the updatePipeline RPC.
>    # When the client received the ack for the last packet of the block (and 
> before the client tries to close the file on NameNode), the replica has been 
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client, 
> the NameNode already knows the latest replicas for the block. Currently the 
> checkReplication call only counts in all the replicas that NN has already 
> received the block_received msg, but based on the above #2 and #3, it may be 
> safe to also count in all the replicas in the 
> BlockUnderConstructionFeature#replicas?
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8999) Namenode need not wait for {{blockReceived}} for the last block before completing a file.

Reply via email to