Duplication Kafka/CDC projects

Marton Greber Tue, 14 Nov 2023 07:10:48 -0800

Devs,


We have been tinkering with a proof of concept(POC) to accomplish cross
cluster async replication. The use case is to have a backup Kudu cluster
for disaster recovery. The new replication feature could be treated as an
alternative to backup/restore, but with finer time granularity. Moreover it
would eliminate the need for intermediate storage.


For our POC we have been looking for inspiration at YugaByte xCluster
replication
<https://docs.yugabyte.com/preview/architecture/docdb-replication/async-replication/>
(the active-passive part). This would be a CDC based approach, where we
have Kudu CDC producers/consumers.


On the other hand while looking at https://gerrit.cloudera.org/c/19909/
"Support write ops to kafka with kafka client” I’ve found some
similarities. Here, according to my understanding, the goal is to
move records from Kudu into Kafka.


I think there is an intersection between these two projects, and wanted to
start the conversation about potential ways to consolidate these two
projects. Figuring out what are the commonalities, and thereby avoiding
pushing in pieces of changes which are quite similar (bloating the
codebase).


—


Some initial thoughts:

   - the need for a CDC interface for Kudu emerges as a commonality
   - https://gerrit.cloudera.org/c/19909/ could avoid adding the Kafka
   client into Kudu, by leveraging the above CDC interface in a Kafka source
   connector for example
      - this would maybe be a better separation of concerns
   - We could re-think our CDC approach such that it is a generic interface
   rather than the Kudu:Kudu specific one.
   - Maybe, for our purpose, we could maybe initially reuse the Kudu ->
   Kafka (with connector) way, and implement async replication by implementing
   the other end: the Kafka sink connector

—


Let me know your thoughts on this one!


Thanks,
Marton

Duplication Kafka/CDC projects

Reply via email to