wangsvip opened a new issue, #3567: URL: https://github.com/apache/incubator-seatunnel/issues/3567
### Search before asking - [X] I had searched in the [feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description I will regularly pull the incremental data of mongodb into hive every day. At present, the extraction method of mongodb is to pull the full amount, and then filter the incremental data in the transform. This method is not friendly. If this table has 1 billion data, it will increase every day. The amount of 5,000 records requires the source side to pull in 1 billion data, and then filter out 5,000 records on the transform side. DBA disagrees with this approach, saying that it consumes a lot of resources, and it needs to be done once a day. If there are too many, the CPU on the direct line will explode. Therefore, I hope to add a place where the filter conditions can be written on the source side, so that I can directly check the 5000 incremental data and export it. 我会每天定时将mongodb 的增量数据拉取至hive 当中,目前mongodb 的抽取方式是全量拉取,然后在 transform 过滤出增量,这样方式不友好,假如这张表有10亿数据,每天增量5000条,需要source 端把10亿数据都拉进来,然后在transform 端过滤出5000条,这的做法,DBA就不同意,在说了这样需要消耗很大的资源,每天搞一次,要是表多的话,直接线上的 CPU就打爆了,所以,我希望在source 端加一个能写过滤条件的地方,这样我直接查这5000的增量数据导出就完事了。 ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
