ywhandzmx opened a new issue, #6665:
URL: https://github.com/apache/paimon/issues/6665

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Paimon version
   
   Using Paimon CDC to sync MongoDB, the entire database to Paimon's Hive 
table. The source inputs several million records per minute, but the sink only 
outputs a few hundred. Adjusting multiple parameters didn’t have much 
effect(用paimon cdc . 同步mongo, 整库到paimon 的hive表. source输入每分钟几百万条, sink输出只有几百条. 
调试了多个参数也没啥效果).
     ./bin/flink run \
     -p 20 \
     -D execution.checkpointing.interval=30000 \
     lib/paimon-flink-action*.jar \
     mongodb-sync-database \
     --mongodb_conf hosts=192.168.xxxx:27017 \
     --mongodb_conf database=xx \
     --mongodb_conf username=xxx \
     --mongodb_conf password=xx \
     --mongodb_conf connection.options='authSource=xx' \
     --excluding_tables xxx.* \
     --warehouse /tmp/xxx/warehouse \
     --database mind01 \
     --catalog_conf metastore=filesystem \
     --table_conf bucket=10 \
     --table_conf sink.parallelism=20 \
     --table_conf write.parallelism=20 \
     --table_conf changelog-producer=input \
     --table_conf write-buffer-size=512mb \
     --table_conf write-buffer-spillable=true \
     --table_conf write-buffer-spillable-threshold=5 \
     --table_conf compaction.max.file-num=30 \
     --table_conf compaction.max.size=1gb \
     --table_conf compaction.early-max.file-num=20 \
     --table_conf num-sorted-run.stop-trigger=10 \
     --table_conf sort-spill-threshold=5 \
     --table_conf changelog-producer.lookup-wait=false \
     --table_conf bucket-key=_id
   
   ### Compute Engine
   
   flink 1.20.3.
   hive 3.1.2
   jdk 1.8
   hadoop 2.7.3
   paimon 1.4
   
   ### Minimal reproduce step
   
   Follow the steps on the official website to download the full Flink package. 
Then download the related Paimon dependencies and put them in the lib folder, 
and start the task.(按照官网提示步骤, 下载flink全量包. 并下载相关派蒙依赖,放到lib下. 并启动任务. )
   
   ### What doesn't meet your expectations?
   
   Throughput is extremely low. Input is several million per minute, output is 
only a few hundred per minute.
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to