[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #1809: [flink]SyncAction based on MongoDB

via GitHub Wed, 16 Aug 2023 00:29:59 -0700


JingsongLi commented on code in PR #1809:
URL: https://github.com/apache/incubator-paimon/pull/1809#discussion_r1295490228



##########
docs/content/how-to/cdc-ingestion.md:
##########
@@ -415,6 +417,103 @@ Synchronization from multiple Kafka topics to Paimon 
database.
     --table-conf changelog-producer=input \
     --table-conf sink.parallelism=4
 ```
+## MongoDB
+
+### Prepare MongoDB Bundled Jar
+
+```
+flink-sql-connector-mongodb-*.jar
+```
+
+### Synchronizing Tables
+
+By using [MongoDBSyncTableAction](/docs/{{< param Branch 
>}}/api/java/org/apache/paimon/flink/action/cdc/mongodb/MongoDBSyncTableAction) 
in a Flink DataStream job or directly through `flink run`, users can 
synchronize one collection from MongoDB into one Paimon table.
+
+To use this feature through `flink run`, run the following shell command.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+    /path/to/paimon-flink-action-{{< version >}}.jar \
+    mongodb-sync-table
+    --warehouse <warehouse-path> \
+    --database <database-name> \
+    --table <table-name> \
+    [--partition-keys <partition-keys>] \
+    [--mongodb-conf <mongodb-cdc-source-conf> [--mongodb-conf 
<mongodb-cdc-source-conf> ...]] \
+    [--catalog-conf <paimon-catalog-conf> [--catalog-conf 
<paimon-catalog-conf> ...]] \
+    [--table-conf <paimon-table-sink-conf> [--table-conf 
<paimon-table-sink-conf> ...]]
+```
+
+{{< generated/mongodb_sync_table >}}
+
+If the Paimon table you specify does not exist, this action will automatically 
create the table. Its schema will be derived from all specified MongoDB 
collection. 
+
+Example 1: synchronize collection into one Paimon table
+
+```bash
+<FLINK_HOME>/bin/flink run \
+    /path/to/paimon-flink-action-{{< version >}}.jar \
+    mongodb-sync-table \
+    --warehouse hdfs:///path/to/warehouse \
+    --database test_db \
+    --table test_table \
+    --partition-keys pt \
+    --mongodb-conf hosts=127.0.0.1:27017 \
+    --mongodb-conf username=root \
+    --mongodb-conf password=123456 \
+    --mongodb-conf database='source_db' \
+    --mongodb-conf collection='source_table1' \
+    --catalog-conf metastore=hive \
+    --catalog-conf uri=thrift://hive-metastore:9083 \
+    --table-conf bucket=4 \
+    --table-conf changelog-producer=input \
+    --table-conf sink.parallelism=4
+```
+
+### Synchronizing Databases
+
+By using [MongoDBSyncDatabaseAction](/docs/{{< param Branch 
>}}/api/java/org/apache/paimon/flink/action/cdc/mongodb/MongoDBSyncDatabaseAction)
 in a Flink DataStream job or directly through `flink run`, users can 
synchronize the whole MongoDB database into one Paimon database.
+
+To use this feature through `flink run`, run the following shell command.
+
+```bash
+<FLINK_HOME>/bin/flink run \
+    /path/to/paimon-flink-action-{{< version >}}.jar \
+    mongodb-sync-database
+    --warehouse <warehouse-path> \
+    --database <database-name> \
+    [--table-prefix <paimon-table-prefix>] \
+    [--table-suffix <paimon-table-suffix>] \
+    [--including-tables <mongodb-table-name|name-regular-expr>] \
+    [--excluding-tables <mongodb-table-name|name-regular-expr>] \
+    [--mongodb-conf <mongodb-cdc-source-conf> [--mongodb-conf 
<mongodb-cdc-source-conf> ...]] \
+    [--catalog-conf <paimon-catalog-conf> [--catalog-conf 
<paimon-catalog-conf> ...]] \
+    [--table-conf <paimon-table-sink-conf> [--table-conf 
<paimon-table-sink-conf> ...]]
+```
+
+{{< generated/mongodb_sync_database >}}
+
+All collections to be synchronized need to set _id as the primary key.
+For each MongoDB collection to be synchronized, if the corresponding Paimon 
table does not exist, this action will automatically create the table. Its 
schema will be derived from all specified MongoDB collection. If the Paimon 
table already exists, its schema will be compared against the schema of all 
specified MongoDB collection.

Review Comment:
   If a new table appears during the synchronization process, it will be 
automatically created?
   
   We should express this information in the document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-paimon] JingsongLi commented on a diff in pull request #1809: [flink]SyncAction based on MongoDB

Reply via email to