Copilot commented on code in PR #4247:
URL: https://github.com/apache/flink-cdc/pull/4247#discussion_r2851806173


##########
docs/content/docs/connectors/flink-sources/mongodb-cdc.md:
##########
@@ -512,6 +512,63 @@ Applications can use change streams to subscribe to all 
data changes on a single
 By the way, Debezium's MongoDB change streams exploration mentioned by 
[DBZ-435](https://issues.redhat.com/browse/DBZ-435) is on roadmap.<br> 
 If it's done, we can consider integrating two kinds of source connector for 
users to choose.
 
+### Scan Newly Added Tables
+
+**Note:** This feature is available since Flink CDC 3.1.0.
+
+The Scan Newly Added Tables feature enables you to add new collections to 
monitor for existing running pipeline. The newly added collections will read 
their snapshot data firstly and then read their change stream automatically.
+
+Imagine this scenario: At the beginning, a Flink job monitors collections 
`[product, user, address]`, but after some days we would like the job can also 
monitor collections `[order, custom]` which contain history data, and we need 
the job can still reuse existing state of the job. This feature can resolve 
this case gracefully.
+
+The following operations show how to enable this feature to resolve above 
scenario. An existing Flink job which uses MongoDB CDC Source like:
+
+```java
+    MongoDBSource<String> mongoSource = MongoDBSource.<String>builder()
+        .hosts("yourHostname:27017")
+        .databaseList("db") // set captured database
+        .collectionList("db.product", "db.user", "db.address") // set captured 
collections
+        .username("yourUsername")
+        .password("yourPassword")
+        .scanNewlyAddedTableEnabled(true) // enable scan the newly added 
tables feature
+        .deserializer(new JsonDebeziumDeserializationSchema()) // converts 
SourceRecord to JSON String
+        .build();
+   // your business code
+```
+
+If we would like to add new collections `[order, custom]` to an existing Flink 
job, we just need to update the `collectionList()` value of the job to include 
`[order, custom]` and restore the job from previous savepoint.

Review Comment:
   This example says new collections are `[order, custom]`, but the code uses 
fully-qualified names in `collectionList()` (e.g., `db.order`, `db.custom`). To 
avoid ambiguity, please update the text to use the same fully-qualified 
collection names.



##########
docs/content/docs/connectors/flink-sources/postgres-cdc.md:
##########
@@ -511,6 +511,71 @@ The config option `scan.startup.mode` specifies the 
startup mode for PostgreSQL
 - `committed-offset`: Skip snapshot phase and start reading events from a 
`confirmed_flush_lsn` offset of replication slot.
 - `snapshot`: Only the snapshot phase is performed and exits after the 
snapshot phase reading is completed.
 
+### Scan Newly Added Tables
+
+**Note:** This feature is available since Flink CDC 3.1.0.
+
+Scan Newly Added Tables feature enables you to add new tables to monitor for 
existing running pipeline. The newly added tables will read their snapshot data 
firstly and then read their WAL (Write-Ahead Log) or replication slot changes 
automatically.
+
+Imagine this scenario: At the beginning, a Flink job monitors tables 
`[product, user, address]`, but after some days we would like the job can also 
monitor tables `[order, custom]` which contain history data, and we need the 
job can still reuse existing state of the job. This feature can resolve this 
case gracefully.

Review Comment:
   This sentence is ungrammatical ("we would like the job can also monitor"). 
Please rephrase to something like "we would like the job to also monitor ..." 
to improve readability.
   ```suggestion
   Imagine this scenario: At the beginning, a Flink job monitors tables 
`[product, user, address]`, but after some days we would like the job to also 
monitor tables `[order, custom]` which contain historical data, and we need the 
job to still reuse the existing state of the job. This feature can resolve this 
case gracefully.
   ```



##########
docs/content/docs/connectors/flink-sources/mongodb-cdc.md:
##########
@@ -512,6 +512,63 @@ Applications can use change streams to subscribe to all 
data changes on a single
 By the way, Debezium's MongoDB change streams exploration mentioned by 
[DBZ-435](https://issues.redhat.com/browse/DBZ-435) is on roadmap.<br> 
 If it's done, we can consider integrating two kinds of source connector for 
users to choose.
 
+### Scan Newly Added Tables

Review Comment:
   The section title says "Tables", but MongoDB terminology throughout this 
section is "collections". Consider renaming the heading (or using 
"Tables/Collections") to avoid confusing MongoDB users.
   ```suggestion
   ### Scan Newly Added Collections
   ```



##########
docs/content/docs/connectors/flink-sources/oracle-cdc.md:
##########
@@ -559,6 +559,67 @@ _Note: the mechanism of `scan.startup.mode` option relying 
on Debezium's `snapsh
 
 The Oracle CDC source can't work in parallel reading, because there is only 
one task can receive change events.
 
+### Scan Newly Added Tables
+
+**Note:** This feature is available since Flink CDC 3.1.0.
+
+Scan Newly Added Tables feature enables you to add new tables to monitor for 
an existing running pipeline. The newly added tables will read their snapshot 
data first and then read their redo log automatically.
+
+Imagine this scenario: At the beginning, a Flink job monitors tables 
`[product, user, address]`, but after some days we would like the job can also 
monitor tables `[order, custom]` which contain history data, and we need the 
job can still reuse existing state of the job. This feature can resolve this 
case gracefully.

Review Comment:
   This sentence is ungrammatical ("we would like the job can also monitor"). 
Please rephrase to something like "we would like the job to also monitor ...".
   ```suggestion
   Imagine this scenario: At the beginning, a Flink job monitors tables 
`[product, user, address]`, but after some days we would like the job to also 
monitor tables `[order, custom]` which contain historical data, and we need the 
job to still reuse the existing state. This feature can resolve this case 
gracefully.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to