syedahsn commented on code in PR #30032:
URL: https://github.com/apache/airflow/pull/30032#discussion_r1158552512


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -72,7 +74,7 @@
 
 class BaseSessionFactory(LoggingMixin):
     """
-    Base AWS Session Factory class to handle boto3 session creation.
+    Base AWS Session Factory class to handle synchronous and async boto 
session creation.

Review Comment:
    It's true that getting credentials from the database is a blocking io 
operation, and will hold up the async thread when the connection is made, but I 
don't think that diminishes the advantages of using deferrable operators too 
significantly. Just taking the example of creating a Redshift cluster, which 
takes around 5 minutes to fully provision, if we can free up a worker so that 
those 5 minutes are used effectively, then the time spent holding up the async 
thread (which is usually around 0.2 to 0.8 seconds in my experience) is 
negligible. I do think core airflow can be improved to better support asyncio 
connections, and it is something that we should make efforts to achieve, but I 
think it can be done in parallel.
   
   > Except unittest we do not have anything for test this part. Also for 
obtain initial credentials it is use STS client (from boto3/botocore) which 
also doesn't use asyncio implementation.
   
   For the unit tests, I was able to work past the sts mocking, and I am in the 
process of adding more tests. It is slightly more cumbersome to not be able to 
use moto, but that is the price of using asyncio, and considering the benefits, 
worth paying in my opinion.
   
   >In addition, AFAIK, some of the part not supported by aiobotocore at all 
such as Google Federation, SPNEGO/Kerberos and EKS IRSA
   
   This is something that again can be incorporated as the need for these 
features become apparent. Ideally, the async_conn property is something that we 
wouldn't encourage users to use on their own, because of the limitations of 
asyncio. The goal is to provide users with the option of using deferrable 
operators without them having to deal with asyncio. 
   
   
   >Potentially better idea is extend BaseAsyncSessionFactory rather than 
BaseSessionFactory
   
   This is an approach that has been taken, and is a valid approach. However, I 
feel that it leads to large amounts of code duplication, which will become 
difficult to manage as things progress. I think it is better to include async 
functionality to BaseSessionFactory, and deal with issues as they come up, so 
that as things move forward, we work towards a more unified code base, rather 
than ending up with 2 similar but separate code bases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to