Sumadhur Reddy Bolli created HDFS-3566:
------------------------------------------
Summary: Custom Replication Policy for Azure
Key: HDFS-3566
URL: https://issues.apache.org/jira/browse/HDFS-3566
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Sumadhur Reddy Bolli
Azure has logical concepts like fault and upgrade domains. Each fault domain
spans multiple upgrade domains and each upgrade domain spans multiple fault
domains. Machines are spread typically evenly across both fault and upgrade
domains. Fault domain failures are typically catastrophic/unplanned failures
and data loss possibility is high. An upgrade domain can be taken down by azure
for maintenance periodically. Each time an upgrade domain is taken down a small
percentage of machines in the upgrade domain(typically 1-2%) are replaced due
to disk failures, thus losing data. Assuming the default replication factor 3,
any 3 data nodes going down at the same time would mean potential data loss.
So, it is important to have a policy that spreads replicas across both fault
and upgrade domains to ensure practically no data loss. The problem here is two
dimensional and the default policy in hadoop is one-dimensional. This policy
would spread the datanodes across atleast 2 fault domains and three upgrade
domains to prevent data loss.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira