[ 
https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-2936:
--------------------------

    Attachment: HDFS-2936.patch

I had this done around submission time but lost my changes as a result of my 
Mac's disk crashing. Re-did the separation logic over this weekend.

The patch makes these changes briefly, for those who're interested in reviewing 
it:
* The role of the current {{dfs.namenode.replication.min}} has been changed to 
only apply restrictions at the user levels: file creation replication factors 
and file replication factor adjustments.
* A new property, {{dfs.namenode.replication.min.for.write}} applies for all 
write conditions, such as adding a block, closing a block, etc.. The former 
property used to control these layers, which I've now merely split to be 
another property in case such hard-guarantees aren't required for anything 
beyond user-layer restrictions.
* There were no tests for min-replication, so I added those to TestFileCreation.
* I added a few regression tests for this prop-split to TestReplication.

This patch (is WIP), while it tests the good conditions, still needs a test for 
the bad/violation conditions. The last test in TestReplication needs more work 
before it can work reliably (and not hang at shutdown). The issue it faces is 
cause of the fact that the DFSClient.close() call NEVER exits if it can't close 
the file for min-replication reasons. It also happily eats 
InterruptedException, making it difficult for me to write a test with 
waitForXSeconds-then-interrupt conditions. However, I'll find another way and 
fix that shortly (unless we discuss about limiting the completeFile retries, 
which is infinite at the moment and retried every 0.4 seconds).
                
> Provide a better way to specify a HDFS-wide minimum replication requirement
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-2936
>                 URL: https://issues.apache.org/jira/browse/HDFS-2936
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Harsh J
>            Assignee: Harsh J
>         Attachments: HDFS-2936.patch
>
>
> Currently, if an admin would like to enforce a replication factor for all 
> files on his HDFS, he does not have a way. He may arguably set 
> dfs.replication.min but that is a very hard guarantee and if the pipeline 
> can't afford that number for some reason/failure, the close() does not 
> succeed on the file being written and leads to several issues.
> After discussing with Todd, we feel it would make sense to introduce a second 
> config (which is ${dfs.replication.min} by default) which would act as a 
> minimum specified replication for files. This is different than 
> dfs.replication.min which also ensures that many replicas are recorded before 
> completeFile() returns... perhaps something like ${dfs.replication.min.user}. 
> We can leave dfs.replication.min alone for hard-guarantees and add 
> ${dfs.replication.min.for.block.completion} which could be left at 1 even if 
> dfs.replication.min is >1, and let files complete normally but not be of a 
> low replication factor (so can be monitored and accounted-for later).
> I'm prefering the second option myself. Will post a patch with tests soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to