ferruzzi commented on PR #25835: URL: https://github.com/apache/airflow/pull/25835#issuecomment-1751473029
This is pretty old, but if it's still an issue, how do we feel about accepting extra_args and then checking it against [boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS) and passing only the allowed values into ExtraArgs? Based on the S3 API docs [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/download_file.html) load() is calling head_object(), and head_object() accepts all of the parameters listed in ALLOWED_DOWNLOAD_ARGS, so that seems to me like the best route without doing a full overhaul of the S3 hook (which IMHO is long overdue also) TL;DR: my proposal is the same as this PR, except that we ensure that only valid keys are provided to load() by checking them against the provided API list ___ A quick script that checks the Boto API and returns a dict {name: type} of all available parameters: ``` import boto3 def get_boto_params(service_name, method_name): client_meta = boto3.client(service_name).meta api_method = client_meta.method_to_api_mapping[method_name] service_model = client_meta.service_model ops_model = service_model.operation_model(api_method) inputs = ops_model.input_shape return { 'available': dict(inputs.members), **inputs.metadata } ``` `get_boto_params("s3", "head_object")` returns: ``` {'available': {'Bucket': <StringShape(BucketName)>, 'IfMatch': <StringShape(IfMatch)>, 'IfModifiedSince': <Shape(IfModifiedSince)>, 'IfNoneMatch': <StringShape(IfNoneMatch)>, 'IfUnmodifiedSince': <Shape(IfUnmodifiedSince)>, 'Key': <StringShape(ObjectKey)>, 'Range': <StringShape(Range)>, 'VersionId': <StringShape(ObjectVersionId)>, 'SSECustomerAlgorithm': <StringShape(SSECustomerAlgorithm)>, 'SSECustomerKey': <StringShape(SSECustomerKey)>, 'SSECustomerKeyMD5': <StringShape(SSECustomerKeyMD5)>, 'RequestPayer': <StringShape(RequestPayer)>, 'PartNumber': <Shape(PartNumber)>, 'ExpectedBucketOwner': <StringShape(AccountId)>, 'ChecksumMode': <StringShape(ChecksumMode)>}, 'required': ['Bucket', 'Key']} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
