potiuk commented on code in PR #30032:
URL: https://github.com/apache/airflow/pull/30032#discussion_r1135422375


##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,6 +42,7 @@
 import botocore.session
 import requests
 import tenacity
+from aiobotocore.session import AioSession, get_session as async_get_session

Review Comment:
   First of all I think we need to be pragmatic and find some reasonable 
tacktical solutions, even if they are not perfect. And possibly work on 
strategic solutions to solve the problems in a long term.
   
   TL;DR; We seem to have to make some compromises for now. I think the best 
compromise it so allow async operators to be merged, while limiting to what we 
can release for AWS provider (also async) limited what pinned botocore version 
allows us to do.
   
   I understand @Taragolis concerns and they are very sound. The problem is 
that we have no easy way out of some  of those problems  in a short term, so 
the question is what we can do now and whether it will block us to fix it in 
the long term (if possible).
   
   > aiobotocore pin botocore and boto3 for specific version
   
   This is unfortunate but until AWS will decide to build async support into 
botocore, I think we will  have to rely on aiobotocore playing the catch-up 
game. The thing with botocore is that it is pushed frequently, but in fact our 
users are not updating that frequently, so as long as what we are ok with 
limiting ourselves in what we publish in AWS operators to whatever version of 
botocore is pinned, this should be acceptable. This means that new features 
will have to wait for upgrades of aiobotocore (for example latest release of 
aiobotocore from a week ago enabled 
https://github.com/apache/airflow/pull/28850#issuecomment-1467791465 to be 
merged).  If the AWS team is fine for it, this is fine for me. We often have 
delays in updating some dependencies and this is quite OK. 
    
   Users could use PythonVirtual or External operator to use botocore's new 
features - more hassle, but I guess this is pretty acceptable solution for 
anything that is not available in the "regular" operators - and we should 
likely describe it somewhere. 
   
   Also pinning the botocore in this case in our CI will be beneficial for that 
approach - because we will have only the "compatible" version of botocore 
installed. 
    
   Question to @o-nikolas and the taem - maybe it is already planned or maybe 
you could raise it and push internally to add async support directly in boto 
library? That sounds like something that Amazon will be presurred to do anyway 
I guess? That would be a long-term strategic solution if we know at least that 
it is coming at some point in time.
   
   > There was plan to add more consistency
   
   I have not looked into many details of the AWS hook split - but, it looks 
good, and I think it can be taken and treated separately (it does not solve 
indeed botocore/aiobotocore pinning). The question is - is there any problem 
with joing the two approaches @Taragolis ? Can we continue the split while 
using either of the sessions? AioSession derives from  botocore.Session, so 
while (minus pinning problems), it seems doable to define our own session 
essentially (guarded by TYPE_CHECKING dynamically) which would be boto.session 
(if no aiocore installed) or `AioSession | Session` (if installed) and that 
would actually help with developing providers that will be compatible with the 
pinned botocore (through our CI).
   
   > aiobotocore can't work with moto
   
   Yeah. more mocking in-house should be needed. Agree with @o-nikolas . Price 
to pay for async.
   
   > Airflow core not ready for async stuff
   
   I think this is a valid concern and not everything is perfect. And yes, it 
is a time-bomb potentially. I would like however to have @andrewgodwin comment 
on that - but I believe we use `sync_to_async` where necessary (same as in 
django) for places that are not ready and while it does introduce some 
serialization it is not catastrophic and is a good way to add **mostly** async 
support. I think Andrew implemented (or at least participated in it) in django 
for similar cases and maybe we can hear from him a comment about the currnet 
sync_to_async use. Maybe we could also (based on the comments) start figuring 
out a long term strategy on how to deal with it.
   
   I think if we agree on this direction, it will unblock tactical changes to 
add more deferrable operators for AWS, while we could also work on improving 
the infrastructure and (hopefully) get to the point where async support will be 
provided by botocore maybe? 
   
   I don't see any other constructive way to approach it, to be honest.
   



##########
airflow/providers/amazon/aws/hooks/base_aws.py:
##########
@@ -42,6 +42,7 @@
 import botocore.session
 import requests
 import tenacity
+from aiobotocore.session import AioSession, get_session as async_get_session

Review Comment:
   First of all I think we need to be pragmatic and find some reasonable 
tactical solutions, even if they are not perfect. And possibly work on 
strategic solutions to solve the problems in a long term.
   
   TL;DR; We seem to have to make some compromises for now. I think the best 
compromise it so allow async operators to be merged, while limiting to what we 
can release for AWS provider (also async) limited what pinned botocore version 
allows us to do.
   
   I understand @Taragolis concerns and they are very sound. The problem is 
that we have no easy way out of some  of those problems  in a short term, so 
the question is what we can do now and whether it will block us to fix it in 
the long term (if possible).
   
   > aiobotocore pin botocore and boto3 for specific version
   
   This is unfortunate but until AWS will decide to build async support into 
botocore, I think we will  have to rely on aiobotocore playing the catch-up 
game. The thing with botocore is that it is pushed frequently, but in fact our 
users are not updating that frequently, so as long as what we are ok with 
limiting ourselves in what we publish in AWS operators to whatever version of 
botocore is pinned, this should be acceptable. This means that new features 
will have to wait for upgrades of aiobotocore (for example latest release of 
aiobotocore from a week ago enabled 
https://github.com/apache/airflow/pull/28850#issuecomment-1467791465 to be 
merged).  If the AWS team is fine for it, this is fine for me. We often have 
delays in updating some dependencies and this is quite OK. 
    
   Users could use PythonVirtual or External operator to use botocore's new 
features - more hassle, but I guess this is pretty acceptable solution for 
anything that is not available in the "regular" operators - and we should 
likely describe it somewhere. 
   
   Also pinning the botocore in this case in our CI will be beneficial for that 
approach - because we will have only the "compatible" version of botocore 
installed. 
    
   Question to @o-nikolas and the taem - maybe it is already planned or maybe 
you could raise it and push internally to add async support directly in boto 
library? That sounds like something that Amazon will be presurred to do anyway 
I guess? That would be a long-term strategic solution if we know at least that 
it is coming at some point in time.
   
   > There was plan to add more consistency
   
   I have not looked into many details of the AWS hook split - but, it looks 
good, and I think it can be taken and treated separately (it does not solve 
indeed botocore/aiobotocore pinning). The question is - is there any problem 
with joing the two approaches @Taragolis ? Can we continue the split while 
using either of the sessions? AioSession derives from  botocore.Session, so 
while (minus pinning problems), it seems doable to define our own session 
essentially (guarded by TYPE_CHECKING dynamically) which would be boto.session 
(if no aiocore installed) or `AioSession | Session` (if installed) and that 
would actually help with developing providers that will be compatible with the 
pinned botocore (through our CI).
   
   > aiobotocore can't work with moto
   
   Yeah. more mocking in-house should be needed. Agree with @o-nikolas . Price 
to pay for async.
   
   > Airflow core not ready for async stuff
   
   I think this is a valid concern and not everything is perfect. And yes, it 
is a time-bomb potentially. I would like however to have @andrewgodwin comment 
on that - but I believe we use `sync_to_async` where necessary (same as in 
django) for places that are not ready and while it does introduce some 
serialization it is not catastrophic and is a good way to add **mostly** async 
support. I think Andrew implemented (or at least participated in it) in django 
for similar cases and maybe we can hear from him a comment about the currnet 
sync_to_async use. Maybe we could also (based on the comments) start figuring 
out a long term strategy on how to deal with it.
   
   I think if we agree on this direction, it will unblock tactical changes to 
add more deferrable operators for AWS, while we could also work on improving 
the infrastructure and (hopefully) get to the point where async support will be 
provided by botocore maybe? 
   
   I don't see any other constructive way to approach it, to be honest.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to