wangsvip opened a new issue, #3567:
URL: https://github.com/apache/incubator-seatunnel/issues/3567

   ### Search before asking
   
   - [X] I had searched in the 
[feature](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   I will regularly pull the incremental data of mongodb into hive every day. 
At present, the extraction method of mongodb is to pull the full amount, and 
then filter the incremental data in the transform. This method is not friendly. 
If this table has 1 billion data, it will increase every day. The amount of 
5,000 records requires the source side to pull in 1 billion data, and then 
filter out 5,000 records on the transform side. DBA disagrees with this 
approach, saying that it consumes a lot of resources, and it needs to be done 
once a day. If there are too many, the CPU on the direct line will explode. 
Therefore, I hope to add a place where the filter conditions can be written on 
the source side, so that I can directly check the 5000 incremental data and 
export it.
   
   我会每天定时将mongodb 的增量数据拉取至hive 当中,目前mongodb 的抽取方式是全量拉取,然后在 transform 
过滤出增量,这样方式不友好,假如这张表有10亿数据,每天增量5000条,需要source 端把10亿数据都拉进来,然后在transform 
端过滤出5000条,这的做法,DBA就不同意,在说了这样需要消耗很大的资源,每天搞一次,要是表多的话,直接线上的 CPU就打爆了,所以,我希望在source 
端加一个能写过滤条件的地方,这样我直接查这5000的增量数据导出就完事了。
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to