Taragolis commented on code in PR #35273:
URL: https://github.com/apache/airflow/pull/35273#discussion_r1376510926
##########
airflow/providers/amazon/aws/fs/s3.py:
##########
@@ -85,7 +85,15 @@ def get_fs(conn_id: str | None) -> AbstractFileSystem:
if proxy_uri := s3_service_config.get(S3_PROXY_URI, None):
config_kwargs["proxies"] = {"http": proxy_uri, "https": proxy_uri}
- fs = S3FileSystem(session=session, config_kwargs=config_kwargs,
endpoint_url=endpoint_url)
+ anon = False
+ if (
+ aws.conn_config.aws_access_key_id is None
+ and aws.conn_config.aws_secret_access_key is None
+ and aws.conn_config.aws_session_token is None
+ ):
+ anon = True
Review Comment:
> The issue is that with an empty connection id and no env variables it will
not default to anonymous,
`None` = use [default `boto3` credentials
strategy](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials),
and this is not mean use anonymous. Depend on situation credentials check
might work, and might not work.
E.g. user deploy on EC2 with enabled Instance IAM Profile, in this case
credentials might be empty, but session.get_credentials() return something
different than `None`
And even more funny stuff: for historical reason AWS Connection fallback to
boto3 default strategy even if connection id not found which is personally for
me "Have a nice debugging time" 🙄
I'm not sure, but maybe better to check
config_kwargs.get("signature_version") == "unsigned".
However it is also doesn't guarantee that we correctly guess is it required
or not. All possible cases might be only resolved when we get finally
constructed `botocore.config.Config` in `AwsGenericHook.
_get_config(config=None)`
https://github.com/apache/airflow/blob/1aa91a482630f389005a9a36b0f038e675163411/airflow/providers/amazon/aws/hooks/base_aws.py#L638-L654
##########
airflow/providers/amazon/aws/fs/s3.py:
##########
@@ -85,7 +85,15 @@ def get_fs(conn_id: str | None) -> AbstractFileSystem:
if proxy_uri := s3_service_config.get(S3_PROXY_URI, None):
config_kwargs["proxies"] = {"http": proxy_uri, "https": proxy_uri}
- fs = S3FileSystem(session=session, config_kwargs=config_kwargs,
endpoint_url=endpoint_url)
+ anon = False
+ if (
+ aws.conn_config.aws_access_key_id is None
+ and aws.conn_config.aws_secret_access_key is None
+ and aws.conn_config.aws_session_token is None
+ ):
+ anon = True
Review Comment:
> The issue is that with an empty connection id and no env variables it will
not default to anonymous,
`None` = use [default `boto3` credentials
strategy](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials),
and this is not mean use anonymous. Depend on situation credentials check
might work, and might not work.
E.g. user deploy on EC2 with enabled Instance IAM Profile, in this case
credentials might be empty, but session.get_credentials() return something
different than `None`
And even more funny stuff: for historical reason AWS Connection fallback to
boto3 default strategy even if connection id not found which is personally for
me "Have a nice debugging time" 🙄
I'm not sure, but maybe better to check
config_kwargs.get("signature_version") == "unsigned".
However it is also doesn't guarantee that we correctly guess is it required
or not. All possible cases might be only resolved when we get finally
constructed `botocore.config.Config` in `AwsGenericHook.
_get_config(config=None)`
https://github.com/apache/airflow/blob/1aa91a482630f389005a9a36b0f038e675163411/airflow/providers/amazon/aws/hooks/base_aws.py#L638-L654
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]