xinbinhuang commented on a change in pull request #15680:
URL: https://github.com/apache/airflow/pull/15680#discussion_r627068858



##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -117,7 +117,6 @@ def execute(self, context) -> bool:
                 mongo_collection=self.mongo_collection,
                 query=cast(dict, self.mongo_query),
                 mongo_db=self.mongo_db,
-                allowDiskUse=self.allow_disk_use,

Review comment:
       > I think whether we should set find(allow_disk_use=True) depends on 
what we want MongoToS3Operator.allow_disk_use to mean. 
   
   Though the wording seems quite different, they seem to achieve the same 
thing: _use temporary files to store data when it exceeds certain memory limits 
(both 100 MB RAM)_ - one is for aggregation stage while the other is for 
blocking sort operation. (internally, I guess they are pretty similar) But 
again, I'm not familiar with Mongo enough to be confident in what I said. It 
will be nice if someone already using Mongo can verify this.
   
   > But then the question becomes how we can pass it only to MongoDB (not 
pymongo!) 4.4+ (released in July 2020) because it would crash on earlier 
versions.
   
   I think a note in the docstring saying that it requires MongoDB 4.4+ for 
`.find` and a link to the doc should be enough. It's not sensible for us to 
control the version of an external system nor that we can. IMO, the users 
should know and control what MongoDB version they are using.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to