Taragolis commented on code in PR #30032:
URL: https://github.com/apache/airflow/pull/30032#discussion_r1154835772
##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -72,7 +74,7 @@
class BaseSessionFactory(LoggingMixin):
"""
- Base AWS Session Factory class to handle boto3 session creation.
+ Base AWS Session Factory class to handle synchronous and async boto
session creation.
Review Comment:
> For this PR, the trigger, which contains all the async code, is only
creating the async client, which is a necessary task, and making the network
call which is async. The client creation only happens once per execution of a
Trigger.
Unfortunetly this is not how it actually work. First of all
`BaseSessionFactory` required Airflow connection in most cases, except cases
where `aws_conn_id = None`, and obtain Airflow connection it is not support
asyncio.
In additional all cases where not required AWS credentials, such as
AssumeRole, which should be quite popular if Airflow deployed in AWS
environment:
https://github.com/apache/airflow/blob/21bacb578a9bdb9a91a297f1ae019402be1c1368/airflow/providers/amazon/aws/hooks/base_aws.py#L190-L195
This is a good example of mixin of `botocore`, `boto3` and `aiobotocre`
which I don't think work well at all. Except unittest we do not have anything
for test this part. Also for obtain initial credentials it is use STS client
(from boto3/botocore) which also doesn't use asyncio implementation.
In addition, AFAIK, some of the part not supported by `aiobotocore` at all
such as Google Federation, SPNEGO/Kerberos and EKS IRSA
As result this call in deferrable mode call a lot of **not asyncio** code
which block io:
https://github.com/apache/airflow/blob/21bacb578a9bdb9a91a297f1ae019402be1c1368/airflow/providers/amazon/aws/triggers/redshift_cluster.py#L121
> Can you provide some more information about this? I'm not sure what you
mean. There is a [watchdog
mechanism](https://github.com/apache/airflow/blob/main/airflow/jobs/triggerer_job.py#L547)
in Airflow which prints out a warning if the async thread is blocked for too
long.
Personally I found that this part print a lot of false positive
notification, even if at this moment there are no deferrable operators run.
---
Just for the record, I'm not sure that is possible to easily create asyncio
implementation for AWS, which support most of the popular part as supported by
regular implementation due to different limitations:
1. Core of Airflow has lack of support asyncio connections
2. aiobotocore not a fully replacement for botocore and especially for boto3
Anyway it just my thoughts and I should write it down just for try to avoid
add some feature which might make someone life harder (especially for
contributors).
Potentially better idea is extend `BaseAsyncSessionFactory` rather than
`BaseSessionFactory` and describe in docs which auth options supported right
now and which are not, that should make users life easier. BTW right now users
might create their own class based on `BaseSessionFactory` but not for
`BaseAsyncSessionFactory`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]