vinothchandar commented on a change in pull request #4481:
URL: https://github.com/apache/hudi/pull/4481#discussion_r779915089



##########
File path: website/docs/hoodie_deltastreamer.md
##########
@@ -336,6 +336,29 @@ jobs: `hoodie.write.meta.key.prefixes = 
'deltastreamer.checkpoint.key'`
 Spark SQL should be configured using this hoodie config:
 hoodie.deltastreamer.source.sql.sql.query = 'select * from source_table'
 
+### Debezium Sources
+Debezium is an open source distributed platform for change data capture(CDC). 
Hudi has both a PostgresDebeziumSource and a 

Review comment:
       can we specify the full class name for these sources? 

##########
File path: website/docs/hoodie_deltastreamer.md
##########
@@ -336,6 +336,29 @@ jobs: `hoodie.write.meta.key.prefixes = 
'deltastreamer.checkpoint.key'`
 Spark SQL should be configured using this hoodie config:
 hoodie.deltastreamer.source.sql.sql.query = 'select * from source_table'
 
+### Debezium Sources
+Debezium is an open source distributed platform for change data capture(CDC). 
Hudi has both a PostgresDebeziumSource and a 
+MysqlDebeziumSource. With these sources, we can continuously capture row-level 
changes that insert, update and delete 
+records that were committed to a MySQL or Postgres database and seamlessly 
apply these changes to Hudi tables.
+
+[Debezium](https://debezium.io/documentation/reference/stable/connectors/postgresql.html)
 is implemented as a Kafka 
+connect source, that reads change logs from databases ([logical 
decoding](https://www.postgresql.org/docs/current/logicaldecoding-explanation.html)
+in PostgreSQL and `binlog` in MySQL) and ingests them into a kafka topic. 
Debezium uses a single kafka topic per table in the source database.
+
+![debezium](/assets/images/debezium_arch.png)
+
+The connector generates data change event records and streams them to Kafka 
topics. For each table, the default behavior 
+is that the connector streams all generated events to a separate Kafka topic 
for that table. In addition, Debezium 
+registers the schema of the change events in kafka to a schema registry, such 
as Confluent schema registry.
+
+#### Configuration 
+#### --@Rajesh, we need to make this more specific to the code and put some 
sample code 

Review comment:
       we probably need to clean this up more?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to