[
https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208753#comment-14208753
]
Ravi Prakash commented on HDFS-2936:
------------------------------------
Thanks Harsh for this JIRA! I would go a different route on this. The
min-replication count to me as a user means "It will take that many failures to
lose data" . That is a simple concept to reason about. If we create a separate
config that applies only for the write pipelines, 1. there is a window of
opportunity during which my assumption is not valid (the time it takes for the
NN to order that replication), and it makes understanding the concept slightly
more complex.
I would suggest that we should fix the write pipeline to contain the minimum
replication count and that the client should wait until that happens. I realize
that might be a much bigger change.
> File close()-ing hangs indefinitely if the number of live blocks does not
> match the minimum replication
> -------------------------------------------------------------------------------------------------------
>
> Key: HDFS-2936
> URL: https://issues.apache.org/jira/browse/HDFS-2936
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 0.23.0
> Reporter: Harsh J
> Assignee: Harsh J
> Attachments: HDFS-2936.patch
>
>
> If an admin wishes to enforce replication today for all the users of their
> cluster, he may set {{dfs.namenode.replication.min}}. This property prevents
> users from creating files with < expected replication factor.
> However, the value of minimum replication set by the above value is also
> checked at several other points, especially during completeFile (close)
> operations. If a condition arises wherein a write's pipeline may have gotten
> only < minimum nodes in it, the completeFile operation does not successfully
> close the file and the client begins to hang waiting for NN to replicate the
> last bad block in the background. This form of hard-guarantee can, for
> example, bring down clusters of HBase during high xceiver load on DN, or disk
> fill-ups on many of them, etc..
> I propose we should split the property in two parts:
> * dfs.namenode.replication.min
> ** Stays the same name, but only checks file creation time replication factor
> value and during adjustments made via setrep/etc.
> * dfs.namenode.replication.min.for.write
> ** New property that disconnects the rest of the checks from the above
> property, such as the checks done during block commit, file complete/close,
> safemode checks for block availability, etc..
> Alternatively, we may also choose to remove the client-side hang of
> completeFile/close calls with a set number of retries. This would further
> require discussion about how a file-closure handle ought to be handled.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)