[ 
https://issues.apache.org/jira/browse/HDFS-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412463#comment-13412463
 ] 

Sumadhur Reddy Bolli commented on HDFS-3564:
--------------------------------------------

Making the policy pluggable should be sufficent. I will re-purpose this JIRA to 
suggest enhancements to the existing abstraction. Network topology is not known 
to the users in azure and it is not strictly hierarichical in nature as fault 
domains span upgrade domains and upgrade domains can span fault domains. 
However, I do not see much value in changing the internal abstractions for 
topology as we do not know the underlying physical topology in azure. I will 
post a document with the details on the JIRA 3566 to explain this better.
                
> Make the replication policy pluggable to allow custom replication policies
> --------------------------------------------------------------------------
>
>                 Key: HDFS-3564
>                 URL: https://issues.apache.org/jira/browse/HDFS-3564
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>            Reporter: Sumadhur Reddy Bolli
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ReplicationTargetChooser currently determines the placement of replicas in 
> hadoop. Making the replication policy pluggable would help in having custom 
> replication policies that suit the environment. 
> Eg1: Enabling placing replicas across different datacenters(not just racks)
> Eg2: Enabling placing replicas across multiple(more than 2) racks
> Eg3: Cloud environments like azure have logical concepts like fault and 
> upgrade domains. Each fault domain spans multiple upgrade domains and each 
> upgrade domain spans multiple fault domains. Machines are spread typically 
> evenly across both fault and upgrade domains. Fault domain failures are 
> typically catastrophic/unplanned failures and data loss possibility is high. 
> An upgrade domain can be taken down by azure for maintenance periodically. 
> Each time an upgrade domain is taken down a small percentage of machines in 
> the upgrade domain(typically 1-2%) are replaced due to disk failures, thus 
> losing data. Assuming the default replication factor 3, any 3 data nodes 
> going down at the same time would mean potential data loss. So, it is 
> important to have a policy that spreads replicas across both fault and 
> upgrade domains to ensure practically no data loss. The problem here is two 
> dimensional and the default policy in hadoop is one-dimensional. Custom 
> policies to address issues like these can be written if we make the policy 
> pluggable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to