Taragolis commented on code in PR #30032:
URL: https://github.com/apache/airflow/pull/30032#discussion_r1136239727


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,6 +42,7 @@
 import botocore.session
 import requests
 import tenacity
+from aiobotocore.session import AioSession, get_session as async_get_session

Review Comment:
   Sorry for late response, do not have time for Airflow project now, and even 
do not use it in production right now. So better say I'm not a user of Airflow 
right now.
   
   I would try to to answer for some messages what I miss.
   
   Note: Everything just my opinion, and I wouldn't try to change anyone 
opinion mind, because I found on real experience that this not work and all 
changes request could be removed   
   
   > That 7 month window is kind of the worst case. I'm seeing aiobotocore 
releases which bump botocore as short as one month (and everything in between).
   
   I wouldn't so optimistic about `aiobotocore` releases, all `botocore` bumps 
since late 2017.
   
   | Bump Date    | Version  | Days Since LU | Months Since LU |
   |--------------|----------|--------------:|----------------:|
   | Mar 7, 2023  | 1.29.76  |           225 |             7,4 |
   | Aug 25, 2022 | 1.27.59  |           130 |             4,3 |
   | Mar 17, 2022 | 1.24.21  |            93 |             3,1 |
   | Dec 14, 2021 | 1.23.24  |            42 |             1,4 |
   | Nov 2, 2021  | 1.22.8   |           294 |             9,7 |
   | Jan 12, 2021 | 1.19.52  |           178 |             5,9 |
   | Aug 18, 2020 | 1.17.44  |           148 |             4,9 |
   | Feb 21, 2020 | 1.15.3   |           101 |             3,3 |
   | Nov 12, 2019 | 1.13.14  |            18 |             0,6 |
   | Oct 25, 2019 | 1.12.252 |            99 |             3,3 |
   | Jul 18, 2019 | 1.12.189 |           159 |             5,2 |
   | Feb 9, 2019  | 1.12.91  |            62 |               2 |
   | Dec 9, 2018  | 1.12.49  |           143 |             4,7 |
   | Jul 19, 2018 | 1.10.58  |            75 |             2,5 |
   | May 5, 2018  | 1.10.12  |           124 |             4,1 |
   | Jan 1, 2018  | 1.8.21   |             0 |               0 |
   | Jan 1, 2018  | 1.8.20   |             0 |               0 |
   | Jan 1, 2018  | 1.8.12   |             0 |               0 |
   | Jan 1, 2018  | 1.7.48   |            47 |             1,5 |
   | Oct 15, 2017 | 1.7.27   |           n/a |             n/a |
   
   So if exclude multiple bumps in 2018-01-01 than optimistic scenario 4 bumps 
per year, pessimistic 2 bumps per year and 3.7 median delay per update.
   
   > I personally haven't looked into this. But, sure, we may have to do more 
manual mocking and testing for the aio usecase.
   
   In past we have fully mocked version of auth as result: 
https://github.com/apache/airflow/pull/27198 which we found only after 2 
months, and affected Amazon Provider 5.1.0 and 6.0.0 (bundled in AWS MWAA 2.4.3)
   
   >  Honestly, I think this ship has long since sailed. I agree that the 
architecture of deferrable operators/trigger is a bit clunky, leads to a lot of 
code duplication, and has this duality of sync vs async hooks. But the fact of 
the matter is Deferrable Operator support is public and we hear from users and 
customers that they want to use it. So I think an incremental/agile approach is 
often the best way to approach it. If we wait until there is perfection, we'd 
never ship anything.
   
   Potentially users doesn't know about all side effects of Deferrable 
Operators, because it hided from them and presented as better that sync 
implementation.
   To be honest we do not have any comparison of Sync vs Deferrable for 
consumption resources, latency, and average time to execute.
   
   > Maybe a helpful exercise would be to list some concrete examples of how 
this PR would degrade an Airflow user's experience more than it would improve 
it (other than the botocore version, that one is quite clear).
   
   This PR only uses blocking io implementation, that mean event loop of 
triggerer would be blocked.
   
   > I have not looked into many details of the AWS hook split - but, it looks 
good, and I think it can be taken and treated separately (it does not solve 
indeed botocore/aiobotocore pinning). The question is - is there any problem 
with joing the two approaches @Taragolis ?
   
   My opinion is that async for Amazon Provider should be Optional dependency 
which better live separate of Sync implementation:
   - Async will not supported all parameters of AWS Connection
   - `boto3.client` in most cases is basically `botocore.{SomeSpecificClient}`, 
except `AWS S3`
   - Dependency hell if user want to upgrade `boto3`, that back to the past 
when `boto3` and `botocore` have weird dependency policy and make `pip` resolve 
crazy
   
   > I think this is a valid concern and not everything is perfect. And yes, it 
is a time-bomb potentially. I would like however to have @andrewgodwin comment 
on that - but I believe we use sync_to_async where necessary (same as in 
django) for places that are not ready and while it does introduce some 
serialization it is not catastrophic and is a good way to add mostly async 
support.
   
   
![image](https://user-images.githubusercontent.com/3998685/225137845-0063d24a-1868-49da-8030-123c9275fd0e.png)
   
   There were PR's which initially tried almost everything wrap to 
`sync_to_async`. I do not check all of them maybe some of them merged into some 
provider or core, I've only known about this ones:
   - https://github.com/apache/airflow/pull/29260
   - https://github.com/apache/airflow/pull/29313
   
   I'm more unhappy that we do not disclose information to users about all side 
effect of `sync_to_async` deferrable operators implementation, and this affect 
all of them except `TimeSensorAsync` and `TimeDeltaSensorAsync`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to