[
https://issues.apache.org/jira/browse/HDDS-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925887#comment-17925887
]
Ivan Andika commented on HDDS-12307:
------------------------------------
This is still in a very early phase. Any ideas welcome.
> Realtime Cross-Region Bucket Replication
> ----------------------------------------
>
> Key: HDDS-12307
> URL: https://issues.apache.org/jira/browse/HDDS-12307
> Project: Apache Ozone
> Issue Type: New Feature
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> Currently, there are a few cross-regions (geo-replicated) DR solution for
> Ozone bucket
> * Run a periodic distcp from the source bucket to the target bucket
> * Take a snapshot on the bucket and send it to the remote site DR sites
> ([https://ozone.apache.org/docs/edge/feature/snapshot.html])
> There are pros and cons for the current approach
> * Pros
> ** It is simpler: Setting up periodic jobs can be done quite easily (e.g.
> using cronjobs)
> ** Distcp implementation will setup a map reduce jobs that will parallelize
> the copy from the source and the cluster
> ** No additional components needed
> * Cons
> ** It is not “realtime”: The freshness of the data depends on how frequent
> and how fast the distcp runs
> ** It incurs significant overhead to the source cluster: It requires
> scanning of all the files in the source cluster (possibly in the target
> cluster)
> Cloudera Replication Manager
> ([https://docs.cloudera.com/replication-manager/1.5.4/replication-policies/topics/rm-pvce-understand-ozone-replication-policy.html)]
> adds an incremental replication support after the initial bootstrap step by
> taking the snapdiff between two snapshots. This is better since there is no
> need to list all the keys under the bucket again, but it's not technically
> realtime.
> This ticket is track possible solutions for a realtime bucket async
> replication between two clusters in different regions (with 100+ms latency).
> The current idea is have a CDC (Change Data Capture) framework on the OM
> which will be sent to a Replication Queue to be processed by a Replicator to
> the target cluster.
> * The CDC component can subscribe and receive any updates for the buckets
> through the gRPC bidirectional streaming APIs
> ** The choice of the "change data" can be either Raft logs / RocksDB WAL
> logs / OM Audit logs
> * Replication Queue can be something like Kafka topic / Ratis logservice
> (https://issues.apache.org/jira/browse/RATIS-271)
> * Replicator uses a master-worker patterns
> ** The master will consume the replications from the replication queue and
> assign a worker to run the cross-region replication tasks
> *** If any worker failed, the master will reassign the replication to
> another task
> ** The worker will run the replication task assigned from the master
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]