[GitHub] [airflow] uranusjr commented on a change in pull request #15680: MongoToS3Operator failed when running with a single query (not aggregate pipeline)

GitBox Wed, 05 May 2021 20:05:23 -0700


uranusjr commented on a change in pull request #15680:
URL: https://github.com/apache/airflow/pull/15680#discussion_r627046024




##########
File path: airflow/providers/amazon/aws/transfers/mongo_to_s3.py
##########
@@ -117,7 +117,6 @@ def execute(self, context) -> bool:
                 mongo_collection=self.mongo_collection,
                 query=cast(dict, self.mongo_query),
                 mongo_db=self.mongo_db,
-                allowDiskUse=self.allow_disk_use,

Review comment:
       The `allow_disk_use` argument in `.find()` maps to MongoDB’s 
[`cursor.allowDiskUse`](https://docs.mongodb.com/manual/reference/method/cursor.allowDiskUse/),
 while `.aggregate()`’s `allowDiskUse` corresponds to [`allowDiskUse` in the 
aggregation 
pipeline](https://docs.mongodb.com/manual/reference/command/aggregate/#mongodb-dbcommand-dbcmd.aggregate).
 I’m honestly not familiar with `cursor.allowDiskUse` (in fact I didn’t know it 
existed until today), but from the documentation the two are quite different.
   
   I think whether we should set `find(allow_disk_use=True)` depends on what we 
want `MongoToS3Operator.allow_disk_use` to mean. The docstring says
   
   > allow_disk_use: in the case you are retrieving a lot of data, you may have 
to use the disk to save it instead of saving all in the RAM
   
   which seems to indicate it probably makes sense to set 
`find(allow_disk_use=True)` from it. But then the question becomes how we can 
pass it only to MongoDB (not pymongo!) 4.4+ (released in July 2020) because it 
would crash on earlier versions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] uranusjr commented on a change in pull request #15680: MongoToS3Operator failed when running with a single query (not aggregate pipeline)

Reply via email to