xinbinhuang commented on a change in pull request #15680:
URL: https://github.com/apache/airflow/pull/15680#discussion_r627068858
##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -117,7 +117,6 @@ def execute(self, context) -> bool:
mongo_collection=self.mongo_collection,
query=cast(dict, self.mongo_query),
mongo_db=self.mongo_db,
- allowDiskUse=self.allow_disk_use,
Review comment:
> I think whether we should set find(allow_disk_use=True) depends on
what we want MongoToS3Operator.allow_disk_use to mean.
Though the wording seems quite different, they seem to achieve the same
thing: _use temporary files to store data when it exceeds certain memory limits
(both 100 MB RAM)_ - one is for aggregation stage while the other is for
blocking sort operation. (internally, I guess they are pretty similar) But
again, I'm not familiar with Mongo enough to be confident in what I said. It
will be nice if someone already using Mongo can verify this.
> But then the question becomes how we can pass it only to MongoDB (not
pymongo!) 4.4+ (released in July 2020) because it would crash on earlier
versions.
I think a note in the docstring saying that it requires MongoDB 4.4+ and a
link to the doc should be enough. It's not sensible for us to control the
version of an external system nor that we can. IMO, the users should know and
control what MongoDB version they are using.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]