tuzonghua opened a new issue, #44896:
URL: https://github.com/apache/airflow/issues/44896

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.28.0
   
   ### Apache Airflow version
   
   2.9.3
   
   ### Operating System
   
   macOS 15.1.1
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When using the `check_fn` function in `S3KeySensor`, there's no way for the 
function to check against a specific object name. The only available keys are 
what's provided in the S3 `head_object` API call, which doesn't include the 
prefix or object name itself. 
   
   ### What you think should happen instead
   
   If `check_fn` takes in a list of file sizes, it should also map the S3 key 
to the file size so there's flexibility in how to filter the list.
   
   ### How to reproduce
   
   If there is a bucket with the following objects:
   ```
   $ aws s3 ls s3://test-bucket/path/to/some/files
   2024-12-11 20:09:12   18348549 000000_0-hadoop_20241212010840_abcdef.gz
   2024-12-11 20:09:14   16543931 000001_0-hadoop_20241212010840_sadfjwij.gz
   2024-12-11 20:09:49          0 _SUCCESS
   ```
   and S3KeySensor:
   ```python
   def check_for_file_in_s3 = S3KeySensor(
           task_id="check_for_file_in_s3",
           soft_fail=True,
           mode="reschedule",
           poke_interval=0,
           timeout=0, 
           bucket_name="test-bucket",
           bucket_key=[
               "path/to/some/files/_SUCCESS", 
               "path/to/some/files/000000_0-hadoop_*"
           ],
           aws_conn_id="spend327_aws_connection",
           retries=0,=
           wildcard_match=True,
           check_fn=check_fn
       )
   ```
   then the following `check_fn` will never succeed:
   ```python
   def check_fn(files: list, **kwargs: Any) -> bool:
       """
       Check that the data file is greater than 0.5 megabyte
   
       :param files: List of S3 object attributes.
       :return: true if the criteria is met
       """
       for file in files:
           if "hadoop" in file:
               return any(f.get("Size", 0) > 524288 for f in files)
           elif "SUCCESS" in file:
               return True
           else:
               return False
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to