[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

Vinayakumar B (JIRA) Sun, 08 Jun 2014 02:56:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021165#comment-14021165
 ]


Vinayakumar B commented on HDFS-5442:
-------------------------------------

Hi Chris, Thanks for taking a look at the design.

Yes, I agree that, documentation is very important task for this feature and I 
am sure with all your support we can make a very good documentation for this. 

{quote}IMO, the choice to activate the mirror must not be taken lightly. 
Suppose there is a network partition such that all nodes in DC1 have 
connectivity to each other, and all nodes in DC2 have connectivity to each 
other, but there is no connectivity between DC1 and DC2. In this scenario, it's 
possible that client applications are still generating edits inside DC1, but an 
operator in DC2 won't be able to determine that. If an operator activates the 
mirror in DC2, starts running client applications in DC2, and then connectivity 
is restored between DC1 and DC2, then we have a split-brain scenario. Since 
there is no reconciliation in the design, I believe the operator's only choice 
at that point is to discard completely the instance in DC1 or the instance in 
DC2. This is another area needing clear documentation, so that system 
administrators fully understand the risk.{quote}
Yes, There will be always a chance of split-brain scenarion when 
inter-datacentre connection gets broken. Here there are multiple things needs 
to be taken care before activating the mirror. Of-course its manual ( at least 
for now).
   1. Validating whether, primary DC is down or just the connection broken b/w 
datacenters.
   2. In case of just connection broken b/w datacenters, in synchronous mode, 
all client operations will fail, as the edits writing synchronously will fail. 
But in case of async mode, operations will not fail, but there will be a resync 
mechanism.
   3. Since as mentioned above, there will not be chance of missing the edits, 
Operator need not discard complete instance of any DC.
 
 
 {quote}The jira description states as a goal that a replicated copy of the 
cluster should be running again in another DC "in a matter of minutes". Just as 
a meta-observation, it appears that the process of activating the mirror in the 
second DC would require significant reconfiguration of the mirror, a restart, 
and then of course waiting on block reports to get out of safe mode. This makes 
me wonder if a system administrator really can execute on this in a matter of 
just minutes. I apologize if I'm misunderstanding. I'll start taking a look at 
the patches on the sub-tasks, and perhaps that will shed some further 
light.{quote}
 Basically, there is a need of changing only one configuration manually to 
activate the mirror as per current design ( dfs.region.primary ) and restart 
the cluster to activate this. Of-Course we can improve this further if any auto 
failover mechanism is implemented in further.
 In case of asynchronous mode and if some blocks are missed to replicate from 
primary deleting missed blocks needs to be done after activating and restarting 
the cluster.
 But in case of synchronous mode, all recent data will be available at both 
primary and mirror clusters, in this case, Namenode will stay for small amount 
of time in safemode due to restart.
 
 {quote}Have you considered the workflow for doing a rolling upgrade on a 
paired primary and mirror? I suspect there are some challenges here around 
coordinating the edit log roll and the edit log op codes to start or finalize a 
rolling upgrade. It seems the mirror must have the new software version fully 
deployed first, before new edits related to new features start flowing from the 
primary.{quote}
 I think, Rolling upgrade was not considered while making this design. As of 
now its assumed that both clusters have same version of hadoop running. 
 Log Rolling of both edits is considered carefully and implemented. You can 
review the patch in one of the subtask.
 
 {quote}Is the DataNode aware of its region ID? If so, then the DataNode could 
check the region ID of the NameNode after registration to prevent 
misconfigurations, similar to what we do for block pool ID and cluster 
ID.{quote}
 As of now, datanode will register to the its own region's namenode. There is 
no across DC datanodes registration. And regionId is not carried via RPC, its 
just a configuration identification item to identify the correct namenode.
 
 
 Hope I have answered all of your queries. Always suggestions/queries are 
welcome. :)

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>            Assignee: Dian Fu
>         Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster 
> Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware 
> failures within a datacenter. Hadoop is not designed today to handle 
> datacenter failures. Although HDFS is not designed for nor deployed in 
> configurations spanning multiple datacenters, replicating data from one 
> location to another is common practice for disaster recovery and global 
> service availability. There are current solutions available for batch 
> replication using data copy/export tools. However, while providing some 
> backup capability for HDFS data, they do not provide the capability to 
> recover all your HDFS data from a datacenter failure and be up and running 
> again with a fully operational Hadoop cluster in another datacenter in a 
> matter of minutes. For disaster recovery from a datacenter failure, we should 
> provide a fully distributed, zero data loss, low latency, high throughput and 
> secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters

Reply via email to