Taragolis commented on code in PR #30032: URL: https://github.com/apache/airflow/pull/30032#discussion_r1136239727
########## airflow/providers/amazon/aws/hooks/base_aws.py: ########## @@ -42,6 +42,7 @@ import botocore.session import requests import tenacity +from aiobotocore.session import AioSession, get_session as async_get_session Review Comment: Sorry for late response, do not have time for Airflow project now, and even do not use it in production right now. So better say I'm not a user of Airflow right now. I would try to to answer for some messages what I miss. Note: Everything just my opinion, and I wouldn't try to change anyone opinion mind, because I found on real experience that this not work and all changes request could be removed > That 7 month window is kind of the worst case. I'm seeing aiobotocore releases which bump botocore as short as one month (and everything in between). I wouldn't so optimistic about `aiobotocore` releases, all `botocore` bumps since late 2017. | Bump Date | Version | Days Since LU | Months Since LU | |--------------|----------|--------------:|----------------:| | Mar 7, 2023 | 1.29.76 | 225 | 7,4 | | Aug 25, 2022 | 1.27.59 | 130 | 4,3 | | Mar 17, 2022 | 1.24.21 | 93 | 3,1 | | Dec 14, 2021 | 1.23.24 | 42 | 1,4 | | Nov 2, 2021 | 1.22.8 | 294 | 9,7 | | Jan 12, 2021 | 1.19.52 | 178 | 5,9 | | Aug 18, 2020 | 1.17.44 | 148 | 4,9 | | Feb 21, 2020 | 1.15.3 | 101 | 3,3 | | Nov 12, 2019 | 1.13.14 | 18 | 0,6 | | Oct 25, 2019 | 1.12.252 | 99 | 3,3 | | Jul 18, 2019 | 1.12.189 | 159 | 5,2 | | Feb 9, 2019 | 1.12.91 | 62 | 2 | | Dec 9, 2018 | 1.12.49 | 143 | 4,7 | | Jul 19, 2018 | 1.10.58 | 75 | 2,5 | | May 5, 2018 | 1.10.12 | 124 | 4,1 | | Jan 1, 2018 | 1.8.21 | 0 | 0 | | Jan 1, 2018 | 1.8.20 | 0 | 0 | | Jan 1, 2018 | 1.8.12 | 0 | 0 | | Jan 1, 2018 | 1.7.48 | 47 | 1,5 | | Oct 15, 2017 | 1.7.27 | n/a | n/a | So if exclude multiple bumps in 2018-01-01 than optimistic scenario 4 bumps per year, pessimistic 2 bumps per year and 3.7 median delay per update. > I personally haven't looked into this. But, sure, we may have to do more manual mocking and testing for the aio usecase. In past we have fully mocked version of auth as result: https://github.com/apache/airflow/pull/27198 which we found only after 2 months, and affected Amazon Provider 5.1.0 and 6.0.0 (bundled in AWS MWAA 2.4.3) > Honestly, I think this ship has long since sailed. I agree that the architecture of deferrable operators/trigger is a bit clunky, leads to a lot of code duplication, and has this duality of sync vs async hooks. But the fact of the matter is Deferrable Operator support is public and we hear from users and customers that they want to use it. So I think an incremental/agile approach is often the best way to approach it. If we wait until there is perfection, we'd never ship anything. Potentially users doesn't know about all side effects of Deferrable Operators, because it hided from them and presented as better that sync implementation. To be honest we do not have any comparison of Sync vs Deferrable for consumption resources, latency, and average time to execute. > Maybe a helpful exercise would be to list some concrete examples of how this PR would degrade an Airflow user's experience more than it would improve it (other than the botocore version, that one is quite clear). This PR only uses blocking io implementation, that mean event loop of triggerer would be blocked. > I have not looked into many details of the AWS hook split - but, it looks good, and I think it can be taken and treated separately (it does not solve indeed botocore/aiobotocore pinning). The question is - is there any problem with joing the two approaches @Taragolis ? My opinion is that async for Amazon Provider should be Optional dependency which better live separate of Sync implementation: - Async will not supported all parameters of AWS Connection - `boto3.client` in most cases is basically `botocore.{SomeSpecificClient}`, except `AWS S3` - Dependency hell if user want to upgrade `boto3`, that back to the past when `boto3` and `botocore` have weird dependency policy and make `pip` resolve crazy > I think this is a valid concern and not everything is perfect. And yes, it is a time-bomb potentially. I would like however to have @andrewgodwin comment on that - but I believe we use sync_to_async where necessary (same as in django) for places that are not ready and while it does introduce some serialization it is not catastrophic and is a good way to add mostly async support.  There were PR's which initially tried almost everything wrap to `sync_to_async`. I do not check all of them maybe some of them merged into some provider or core, I've only known about this ones: - https://github.com/apache/airflow/pull/29260 - https://github.com/apache/airflow/pull/29313 I'm more unhappy that we do not disclose information to users about all side effect of `sync_to_async` deferrable operators implementation, and this affect all of them except `TimeSensorAsync` and `TimeDeltaSensorAsync` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
