[ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246822#comment-14246822
 ] 

Hari Sekhon commented on HDFS-5442:
-----------------------------------

MapR's approach to DR is perhaps the best in the Hadoop world right now. 
MapR-FS takes snapshots and replicates those snapshots to the other site. When 
the snapshot is fully copied then it is atomically enabled at the other site.

This is the best possible scenario for consistency and has worked well 
including in-built scheduling.

So perhaps HDFS DR requires a 2 administrative options for DR depending on what 
is required:

1. Streaming continuous block replication (inconsistent unless you guarantee 
block write ordering which WANdisco does not)
2. Atomic snapshot mirroring + enabling at other site like MapR-FS

I suspect number 2 will require some improvement to the HDFS snapshots to allow 
rolling forward a snapshot at the DR site once it's complete?

Also, number 2 also allows for schedule changes, ie snap copy every 15 mins or 
every 1 hour or every 1 day so you only get the net changes and not every 
single intermediate change, which may mean less data copied (although I doubt 
that in practice unless people are rewriting/replacing datasets like HBase 
compactions).

Regardless of the solution, there must be configurable path exclusions such as 
for /tmp and other places of intemediate data.

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>            Assignee: Dian Fu
>         Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to