molcay commented on issue #11323: URL: https://github.com/apache/airflow/issues/11323#issuecomment-3836461466
> > Hi [@shahar1](https://github.com/shahar1), > > I have a concern about this implementation. > > Since the `XCom` has a hard limit to 1GB (due to the DB field type `JSONB`) and the return value from a Operator stored in the `XCom`, do we need to do something extra? I know the 1GB is a huge size but still... > > Lets assume you are migration a bucket to another bucket (this can be trained AI model data) and try to move the data between buckets. I think we can hit that limit in this kind of scenario > > Hey, thanks for raising these concerns! > > 1. Some operators included in this issue (still) transfer only one file, so I wouldn't worry about them. > 2. Specificaly for transferring an entire bucket, especially if it includes a lot of files, I don't think that that the "regular" GCS2GCS transfer operator is right tool to use (using Airflow's resources for transferring), but I'd rather use the GCP's storage transfer service instead (deferring to GCP). > 3. If you insist to use the GCS2GCS operator, you could always pass `do_xcom_push = False` and avoid the xcom being pushed at all :) Hey, thank you for the answer. I was concerned about if someone is using bucket-to-bucket transfer via "bad practice" :) I was not aware of the `do_xcom_push` flag. If we have some way to to avoid this, it is good 👍🏼 Currently, from the user perspective, if someone is using this "bad practice", they will face this issue. After that, they have some options: - Change the implementation and use the proper way (like Storage Transfer Service) - Change the implementation and pass `do_xcom_push` to avoid this behavior > I am not sure there is a service for cross cloud transfers however we already have the `do_xcom_push`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
