Provide a better way to specify a HDFS-wide minimum replication requirement
---------------------------------------------------------------------------
Key: HDFS-2936
URL: https://issues.apache.org/jira/browse/HDFS-2936
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 0.23.0
Reporter: Harsh J
Assignee: Harsh J
Currently, if an admin would like to enforce a replication factor for all files
on his HDFS, he does not have a way. He may arguably set dfs.replication.min
but that is a very hard guarantee and if the pipeline can't afford that number
for some reason/failure, the close() does not succeed on the file being written
and leads to several issues.
After discussing with Todd, we feel it would make sense to introduce a second
config (which is ${dfs.replication.min} by default) which would act as a
minimum specified replication for files. This is different than
dfs.replication.min which also ensures that many replicas are recorded before
completeFile() returns... perhaps something like ${dfs.replication.min.user}.
We can leave dfs.replication.min alone for hard-guarantees and add
${dfs.replication.min.for.block.completion} which could be left at 1 even if
dfs.replication.min is >1, and let files complete normally but not be of a low
replication factor (so can be monitored and accounted-for later).
I'm prefering the second option myself. Will post a patch with tests soon.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira