potiuk commented on code in PR #30032: URL: https://github.com/apache/airflow/pull/30032#discussion_r1135422375
########## airflow/providers/amazon/aws/hooks/base_aws.py: ########## @@ -42,6 +42,7 @@ import botocore.session import requests import tenacity +from aiobotocore.session import AioSession, get_session as async_get_session Review Comment: First of all I think we need to be pragmatic and find some reasonable tacktical solutions, even if they are not perfect. And possibly work on strategic solutions to solve the problems in a long term. TL;DR; We seem to have to make some compromises for now. I think the best compromise it so allow async operators to be merged, while limiting to what we can release for AWS provider (also async) limited what pinned botocore version allows us to do. I understand @Taragolis concerns and they are very sound. The problem is that we have no easy way out of some of those problems in a short term, so the question is what we can do now and whether it will block us to fix it in the long term (if possible). > aiobotocore pin botocore and boto3 for specific version This is unfortunate but until AWS will decide to build async support into botocore, I think we will have to rely on aiobotocore playing the catch-up game. The thing with botocore is that it is pushed frequently, but in fact our users are not updating that frequently, so as long as what we are ok with limiting ourselves in what we publish in AWS operators to whatever version of botocore is pinned, this should be acceptable. This means that new features will have to wait for upgrades of aiobotocore (for example latest release of aiobotocore from a week ago enabled https://github.com/apache/airflow/pull/28850#issuecomment-1467791465 to be merged). If the AWS team is fine for it, this is fine for me. We often have delays in updating some dependencies and this is quite OK. Users could use PythonVirtual or External operator to use botocore's new features - more hassle, but I guess this is pretty acceptable solution for anything that is not available in the "regular" operators - and we should likely describe it somewhere. Also pinning the botocore in this case in our CI will be beneficial for that approach - because we will have only the "compatible" version of botocore installed. Question to @o-nikolas and the taem - maybe it is already planned or maybe you could raise it and push internally to add async support directly in boto library? That sounds like something that Amazon will be presurred to do anyway I guess? That would be a long-term strategic solution if we know at least that it is coming at some point in time. > There was plan to add more consistency I have not looked into many details of the AWS hook split - but, it looks good, and I think it can be taken and treated separately (it does not solve indeed botocore/aiobotocore pinning). The question is - is there any problem with joing the two approaches @Taragolis ? Can we continue the split while using either of the sessions? AioSession derives from botocore.Session, so while (minus pinning problems), it seems doable to define our own session essentially (guarded by TYPE_CHECKING dynamically) which would be boto.session (if no aiocore installed) or `AioSession | Session` (if installed) and that would actually help with developing providers that will be compatible with the pinned botocore (through our CI). > aiobotocore can't work with moto Yeah. more mocking in-house should be needed. Agree with @o-nikolas . Price to pay for async. > Airflow core not ready for async stuff I think this is a valid concern and not everything is perfect. And yes, it is a time-bomb potentially. I would like however to have @andrewgodwin comment on that - but I believe we use `sync_to_async` where necessary (same as in django) for places that are not ready and while it does introduce some serialization it is not catastrophic and is a good way to add **mostly** async support. I think Andrew implemented (or at least participated in it) in django for similar cases and maybe we can hear from him a comment about the currnet sync_to_async use. Maybe we could also (based on the comments) start figuring out a long term strategy on how to deal with it. I think if we agree on this direction, it will unblock tactical changes to add more deferrable operators for AWS, while we could also work on improving the infrastructure and (hopefully) get to the point where async support will be provided by botocore maybe? I don't see any other constructive way to approach it, to be honest. ########## airflow/providers/amazon/aws/hooks/base_aws.py: ########## @@ -42,6 +42,7 @@ import botocore.session import requests import tenacity +from aiobotocore.session import AioSession, get_session as async_get_session Review Comment: First of all I think we need to be pragmatic and find some reasonable tactical solutions, even if they are not perfect. And possibly work on strategic solutions to solve the problems in a long term. TL;DR; We seem to have to make some compromises for now. I think the best compromise it so allow async operators to be merged, while limiting to what we can release for AWS provider (also async) limited what pinned botocore version allows us to do. I understand @Taragolis concerns and they are very sound. The problem is that we have no easy way out of some of those problems in a short term, so the question is what we can do now and whether it will block us to fix it in the long term (if possible). > aiobotocore pin botocore and boto3 for specific version This is unfortunate but until AWS will decide to build async support into botocore, I think we will have to rely on aiobotocore playing the catch-up game. The thing with botocore is that it is pushed frequently, but in fact our users are not updating that frequently, so as long as what we are ok with limiting ourselves in what we publish in AWS operators to whatever version of botocore is pinned, this should be acceptable. This means that new features will have to wait for upgrades of aiobotocore (for example latest release of aiobotocore from a week ago enabled https://github.com/apache/airflow/pull/28850#issuecomment-1467791465 to be merged). If the AWS team is fine for it, this is fine for me. We often have delays in updating some dependencies and this is quite OK. Users could use PythonVirtual or External operator to use botocore's new features - more hassle, but I guess this is pretty acceptable solution for anything that is not available in the "regular" operators - and we should likely describe it somewhere. Also pinning the botocore in this case in our CI will be beneficial for that approach - because we will have only the "compatible" version of botocore installed. Question to @o-nikolas and the taem - maybe it is already planned or maybe you could raise it and push internally to add async support directly in boto library? That sounds like something that Amazon will be presurred to do anyway I guess? That would be a long-term strategic solution if we know at least that it is coming at some point in time. > There was plan to add more consistency I have not looked into many details of the AWS hook split - but, it looks good, and I think it can be taken and treated separately (it does not solve indeed botocore/aiobotocore pinning). The question is - is there any problem with joing the two approaches @Taragolis ? Can we continue the split while using either of the sessions? AioSession derives from botocore.Session, so while (minus pinning problems), it seems doable to define our own session essentially (guarded by TYPE_CHECKING dynamically) which would be boto.session (if no aiocore installed) or `AioSession | Session` (if installed) and that would actually help with developing providers that will be compatible with the pinned botocore (through our CI). > aiobotocore can't work with moto Yeah. more mocking in-house should be needed. Agree with @o-nikolas . Price to pay for async. > Airflow core not ready for async stuff I think this is a valid concern and not everything is perfect. And yes, it is a time-bomb potentially. I would like however to have @andrewgodwin comment on that - but I believe we use `sync_to_async` where necessary (same as in django) for places that are not ready and while it does introduce some serialization it is not catastrophic and is a good way to add **mostly** async support. I think Andrew implemented (or at least participated in it) in django for similar cases and maybe we can hear from him a comment about the currnet sync_to_async use. Maybe we could also (based on the comments) start figuring out a long term strategy on how to deal with it. I think if we agree on this direction, it will unblock tactical changes to add more deferrable operators for AWS, while we could also work on improving the infrastructure and (hopefully) get to the point where async support will be provided by botocore maybe? I don't see any other constructive way to approach it, to be honest. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
