[jira] [Commented] (HDFS-9075) Multiple datacenter replication inside one HDFS cluster

Chris Nauroth (JIRA) Tue, 15 Sep 2015 13:09:21 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746060#comment-14746060
 ]


Chris Nauroth commented on HDFS-9075:
-------------------------------------

HDFS-1432 and HDFS-5442 are prior proposals for multiple data center support.  
Both appear to be inactive or abandoned at this point.

> Multiple datacenter replication inside one HDFS cluster
> -------------------------------------------------------
>
>                 Key: HDFS-9075
>                 URL: https://issues.apache.org/jira/browse/HDFS-9075
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: He Tianyi
>            Assignee: He Tianyi
>
> It is common scenario of deploying multiple datacenter for scaling and 
> disaster tolerant. 
> In this case we certainly want that data can be shared transparently (to 
> user) across datacenters.
> For example, say we have a raw user action log stored daily, different 
> computations may take place with the log as input. As scale grows, we may 
> want to schedule various kind of computations in more than one datacenter.
> As far as i know, current solution is to deploy multiple independent clusters 
> corresponding to datacenters, using {{distcp}} to sync data files between 
> them.
> But in this case, user needs to know exactly where data is stored, and 
> mistakes may be made during human-intervened operations. After all, it is 
> basically a computer job.
> Based on these facts, it is obvious that a multiple datacenter replication 
> solution may solve the scenario.
> I am working one prototype that works with 2 datacenters, the goal is to 
> provide data replication between datacenters transparently and minimize the 
> inter-dc bandwidth usage. Basic idea is replicate blocks to both DC and 
> determine number of replications by historical statistics of access behaviors 
> of that part of namespace.
> I will post a design document soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-9075) Multiple datacenter replication inside one HDFS cluster

Reply via email to