ferruzzi commented on PR #25835:
URL: https://github.com/apache/airflow/pull/25835#issuecomment-1751473029

   This is pretty old, but if it's still an issue, how do we feel about 
accepting extra_args and then checking it against 
[boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS)
 and passing only the allowed values into ExtraArgs?
   
   Based on the S3 API docs 
[here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/download_file.html)
 load() is calling head_object(), and head_object() accepts all of the 
parameters listed in ALLOWED_DOWNLOAD_ARGS, so that seems to me like the best 
route without doing a full overhaul of the S3 hook (which IMHO is long overdue 
also)
   
   TL;DR:  my proposal is the same as this PR, except that we ensure that only 
valid keys are provided to load() by checking them against the provided API list
   
   
   
   ___
   
   A quick script that checks the Boto API and returns a dict {name: type} of 
all available parameters:
   
   ```
   import boto3
   def get_boto_params(service_name, method_name):
       client_meta = boto3.client(service_name).meta
   
       api_method = client_meta.method_to_api_mapping[method_name]
       service_model = client_meta.service_model
       ops_model = service_model.operation_model(api_method)
       inputs = ops_model.input_shape
   
       return {
           'available': dict(inputs.members),
           **inputs.metadata
       } 
    ```
    `get_boto_params("s3", "head_object")` returns:
    
    ```
   {'available': {'Bucket': <StringShape(BucketName)>,
     'IfMatch': <StringShape(IfMatch)>,
     'IfModifiedSince': <Shape(IfModifiedSince)>,
     'IfNoneMatch': <StringShape(IfNoneMatch)>,
     'IfUnmodifiedSince': <Shape(IfUnmodifiedSince)>,
     'Key': <StringShape(ObjectKey)>,
     'Range': <StringShape(Range)>,
     'VersionId': <StringShape(ObjectVersionId)>,
     'SSECustomerAlgorithm': <StringShape(SSECustomerAlgorithm)>,
     'SSECustomerKey': <StringShape(SSECustomerKey)>,
     'SSECustomerKeyMD5': <StringShape(SSECustomerKeyMD5)>,
     'RequestPayer': <StringShape(RequestPayer)>,
     'PartNumber': <Shape(PartNumber)>,
     'ExpectedBucketOwner': <StringShape(AccountId)>,
     'ChecksumMode': <StringShape(ChecksumMode)>},
    'required': ['Bucket', 'Key']}
   ```
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to