[
https://issues.apache.org/jira/browse/HDFS-8999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065165#comment-15065165
]
Konstantin Shvachko commented on HDFS-8999:
-------------------------------------------
> the goal of this jira is to allow closing the files without waiting for IBRs
> from DN.
Why? Here are some arguments against doing it:
# _NN only waits for one replica_, then it can close the file. Before HDFS-1172
the problem was, that NN would immediately start replicating the block without
waiting for remaining IBRs. This overwhelmed NN even more. HDFS-1172 solved it
by placing replicas are in pendingReplication, so DNs have more time to IBR.
# If 1 is too much, _one can cause immediate close of a file by setting minimum
replication to 0_ instead of default 1. It is a configuration change, no need
for code changes.
# _Client can only be trusted to report replica length, but not its locations._
NN and clients know only where replicas should be, but not where they are. NN
trusting clients about locations is an induced knowledge, more like in gossip
protocols.
# _Race condition between client and FBR reporting._ Why should one deal with
this and potentially other races, which we haven't thought about yet, if it
works as is.
# _Delaying IBRs, will not solve the NN overwhelming problem._ People will just
add more DNs, and the problem comes back.
NN limits HDFS scalability, everybody knows it, just keep your scale right.
> Namenode need not wait for {{blockReceived}} for the last block before
> completing a file.
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-8999
> URL: https://issues.apache.org/jira/browse/HDFS-8999
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Jitendra Nath Pandey
> Assignee: Tsz Wo Nicholas Sze
>
> This comes out of a discussion in HDFS-8763. Pasting [~jingzhao]'s comment
> from the jira:
> {quote}
> ...whether we need to let NameNode wait for all the block_received msgs to
> announce the replica is safe. Looking into the code, now we have
> # NameNode knows the DataNodes involved when initially setting up the
> writing pipeline
> # If any DataNode fails during the writing, client bumps the GS and
> finally reports all the DataNodes included in the new pipeline to NameNode
> through the updatePipeline RPC.
> # When the client received the ack for the last packet of the block (and
> before the client tries to close the file on NameNode), the replica has been
> finalized in all the DataNodes.
> Then in this case, when NameNode receives the close request from the client,
> the NameNode already knows the latest replicas for the block. Currently the
> checkReplication call only counts in all the replicas that NN has already
> received the block_received msg, but based on the above #2 and #3, it may be
> safe to also count in all the replicas in the
> BlockUnderConstructionFeature#replicas?
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)