[jira] [Commented] (HDDS-12307) Realtime Cross-Region Bucket Replication

Ivan Andika (Jira) Tue, 11 Feb 2025 00:10:14 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-12307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925887#comment-17925887
 ]


Ivan Andika commented on HDDS-12307:
------------------------------------

This is still in a very early phase. Any ideas welcome.

> Realtime Cross-Region Bucket Replication
> ----------------------------------------
>
>                 Key: HDDS-12307
>                 URL: https://issues.apache.org/jira/browse/HDDS-12307
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> Currently, there are a few cross-regions (geo-replicated) DR solution for 
> Ozone bucket
>  * Run a periodic distcp from the source bucket to the target bucket
>  * Take a snapshot on the bucket and send it to the remote site DR sites 
> ([https://ozone.apache.org/docs/edge/feature/snapshot.html])
> There are pros and cons for the current approach
>  * Pros
>  ** It is simpler: Setting up periodic jobs can be done quite easily (e.g. 
> using cronjobs)
>  ** Distcp implementation will setup a map reduce jobs that will parallelize 
> the copy from the source and the cluster
>  ** No additional components needed
>  * Cons
>  ** It is not “realtime”: The freshness of the data depends on how frequent 
> and how fast the distcp runs
>  ** It incurs significant overhead to the source cluster: It requires 
> scanning of all the files in the source cluster (possibly in the target 
> cluster)
> Cloudera Replication Manager 
> ([https://docs.cloudera.com/replication-manager/1.5.4/replication-policies/topics/rm-pvce-understand-ozone-replication-policy.html)]
>  adds an incremental replication support after the initial bootstrap step by 
> taking the snapdiff between two snapshots. This is better since there is no 
> need to list all the keys under the bucket again, but it's not technically 
> realtime.
> This ticket is track possible solutions for a realtime bucket async 
> replication between two clusters in different regions (with 100+ms latency). 
> The current idea is have a CDC (Change Data Capture) framework on the OM 
> which will be sent to a Replication Queue to be processed by a Replicator to 
> the target cluster.
>  * The CDC component can subscribe and receive any updates for the buckets 
> through the gRPC bidirectional streaming APIs
>  ** The choice of the "change data" can be either Raft logs / RocksDB WAL 
> logs / OM Audit logs
>  * Replication Queue can be something like Kafka topic / Ratis logservice 
> (https://issues.apache.org/jira/browse/RATIS-271)
>  * Replicator uses a master-worker patterns
>  ** The master will consume the replications from the replication queue and 
> assign a worker to run the cross-region replication tasks
>  *** If any worker failed, the master will reassign the replication to 
> another task
>  ** The worker will run the replication task assigned from the master



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-12307) Realtime Cross-Region Bucket Replication

Reply via email to