Sumadhur Reddy Bolli created HDFS-3564:
------------------------------------------
Summary: Make the replication policy pluggable to allow custom
replication policies
Key: HDFS-3564
URL: https://issues.apache.org/jira/browse/HDFS-3564
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Sumadhur Reddy Bolli
ReplicationTargetChooser currently determines the placement of replicas in
hadoop. Making the replication policy pluggable would help in having custom
replication policies that suit the environment.
Eg1: Enabling placing replicas across different datacenters(not just racks)
Eg2: Enabling placing replicas across multiple(more than 2) racks
Eg3: Cloud environments like azure have logical concepts like fault and upgrade
domains. Each fault domain spans multiple upgrade domains and each upgrade
domain spans multiple fault domains. Machines are spread typically evenly
across both fault and upgrade domains. Fault domain failures are typically
catastrophic/unplanned failures and data loss possibility is high. An upgrade
domain can be taken down by azure for maintenance periodically. Each time an
upgrade domain is taken down a small percentage of machines in the upgrade
domain(typically 1-2%) are replaced due to disk failures, thus losing data.
Assuming the default replication factor 3, any 3 data nodes going down at the
same time would mean potential data loss. So, it is important to have a policy
that spreads replicas across both fault and upgrade domains to ensure
practically no data loss. The problem here is two dimensional and the default
policy in hadoop is one-dimensional. Custom policies to address issues like
these can be written if we make the policy pluggable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira