JingsongLi commented on code in PR #1809:
URL: https://github.com/apache/incubator-paimon/pull/1809#discussion_r1295490228
##########
docs/content/how-to/cdc-ingestion.md:
##########
@@ -415,6 +417,103 @@ Synchronization from multiple Kafka topics to Paimon
database.
--table-conf changelog-producer=input \
--table-conf sink.parallelism=4
```
+## MongoDB
+
+### Prepare MongoDB Bundled Jar
+
+```
+flink-sql-connector-mongodb-*.jar
+```
+
+### Synchronizing Tables
+
+By using [MongoDBSyncTableAction](/docs/{{< param Branch
>}}/api/java/org/apache/paimon/flink/action/cdc/mongodb/MongoDBSyncTableAction)
in a Flink DataStream job or directly through `flink run`, users can
synchronize one collection from MongoDB into one Paimon table.
+
+To use this feature through `flink run`, run the following shell command.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+ /path/to/paimon-flink-action-{{< version >}}.jar \
+ mongodb-sync-table
+ --warehouse <warehouse-path> \
+ --database <database-name> \
+ --table <table-name> \
+ [--partition-keys <partition-keys>] \
+ [--mongodb-conf <mongodb-cdc-source-conf> [--mongodb-conf
<mongodb-cdc-source-conf> ...]] \
+ [--catalog-conf <paimon-catalog-conf> [--catalog-conf
<paimon-catalog-conf> ...]] \
+ [--table-conf <paimon-table-sink-conf> [--table-conf
<paimon-table-sink-conf> ...]]
+```
+
+{{< generated/mongodb_sync_table >}}
+
+If the Paimon table you specify does not exist, this action will automatically
create the table. Its schema will be derived from all specified MongoDB
collection.
+
+Example 1: synchronize collection into one Paimon table
+
+```bash
+<FLINK_HOME>/bin/flink run \
+ /path/to/paimon-flink-action-{{< version >}}.jar \
+ mongodb-sync-table \
+ --warehouse hdfs:///path/to/warehouse \
+ --database test_db \
+ --table test_table \
+ --partition-keys pt \
+ --mongodb-conf hosts=127.0.0.1:27017 \
+ --mongodb-conf username=root \
+ --mongodb-conf password=123456 \
+ --mongodb-conf database='source_db' \
+ --mongodb-conf collection='source_table1' \
+ --catalog-conf metastore=hive \
+ --catalog-conf uri=thrift://hive-metastore:9083 \
+ --table-conf bucket=4 \
+ --table-conf changelog-producer=input \
+ --table-conf sink.parallelism=4
+```
+
+### Synchronizing Databases
+
+By using [MongoDBSyncDatabaseAction](/docs/{{< param Branch
>}}/api/java/org/apache/paimon/flink/action/cdc/mongodb/MongoDBSyncDatabaseAction)
in a Flink DataStream job or directly through `flink run`, users can
synchronize the whole MongoDB database into one Paimon database.
+
+To use this feature through `flink run`, run the following shell command.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+ /path/to/paimon-flink-action-{{< version >}}.jar \
+ mongodb-sync-database
+ --warehouse <warehouse-path> \
+ --database <database-name> \
+ [--table-prefix <paimon-table-prefix>] \
+ [--table-suffix <paimon-table-suffix>] \
+ [--including-tables <mongodb-table-name|name-regular-expr>] \
+ [--excluding-tables <mongodb-table-name|name-regular-expr>] \
+ [--mongodb-conf <mongodb-cdc-source-conf> [--mongodb-conf
<mongodb-cdc-source-conf> ...]] \
+ [--catalog-conf <paimon-catalog-conf> [--catalog-conf
<paimon-catalog-conf> ...]] \
+ [--table-conf <paimon-table-sink-conf> [--table-conf
<paimon-table-sink-conf> ...]]
+```
+
+{{< generated/mongodb_sync_database >}}
+
+All collections to be synchronized need to set _id as the primary key.
+For each MongoDB collection to be synchronized, if the corresponding Paimon
table does not exist, this action will automatically create the table. Its
schema will be derived from all specified MongoDB collection. If the Paimon
table already exists, its schema will be compared against the schema of all
specified MongoDB collection.
Review Comment:
If a new table appears during the synchronization process, it will be
automatically created?
We should express this information in the document.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]