syun64 opened a new issue, #28272:
URL: https://github.com/apache/airflow/issues/28272

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==6.2.0
   
   ### Apache Airflow version
   
   2.5.0
   
   ### Operating System
   
   Red Hat Enterprise Linux Server 7.6 (Maipo)
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   Simple virtualenv deployment
   
   ### What happened
   
   bucket_key is a template_field in S3KeySensor, which means that is expected 
to be rendered as a template field.
   
   The supported types for the attribute are both 'str' and 'list'. There is 
also a [conditional operation in the __init__ 
function](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/sensors/s3.py#L89)
 of the class that relies on the type of the input data, that converts the 
attribute to a list of strings. If a list of str is passed in through Jinja 
template, **self.bucket_key** is available as a _**doubly-nested list of 
strings**_, rather than a list of strings.
   
   This is because the input value of **bucket_key** can only be a string type 
that represents the template-string when used as a template_field. These 
template_fields are then converted to their corresponding values when 
instantiated as a task_instance.
   
   Example log from __init__ function:
   ` scheduler | DEBUG | type: <class 'list'> |  val: ["{{ 
ti.xcom_pull(task_ids='t1') }}"]`
   
   Example log from poke function:
   `poke | DEBUG | type: <class 'list'> |  val: [["s3://test_bucket/test_key1", 
"s3://test_bucket/test_key2"]]`
   
   This leads to the poke function throwing an 
[exception](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/hooks/s3.py#L172)
 as each individual key needs to be a string value to parse the url, but is 
being passed as a list (since self.bucket_key is a nested list).
   
   ### What you think should happen instead
   
   Instead of putting the input value of **bucket_key** in a list, we should 
store the value as-is upon initialization of the class, and just conditionally 
check the type of the attribute within the poke function.
   
   [def 
\_\_init\_\_](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/sensors/s3.py#L89)
   `self.bucket_key = bucket_key`
   (which willstore the input values correctly as a str or a list when the task 
instance is created and the template fields are rendered)
   
   [def 
poke](https://github.com/apache/airflow/blob/main/airflow/providers/amazon/aws/sensors/s3.py#L127)
   ``` 
   def poke(self, context: Context):
           if isinstance(self.bucket_key, str):
                   return self._check_key(key)
           else:
                   return all(self._check_key(key) for key in self.bucket_key)
   ```
   
   ### How to reproduce
   
   1. Use a template field as the bucket_key attribute in S3KeySensor
   2. Pass a list of strings as the rendered template input value for the 
bucket_key attribute in the S3KeySensor task. (e.g. as an XCOM or Variable 
pulled value)
   
   Example:
   
   ```
   with DAG(
           ...
           render_template_as_native_obj=True,
       ) as dag:
             @task(task_id="get_list_of_str", do_xcom_push=True)
                     def get_list_of_str():
                            return ["s3://test_bucket/test_key1", 
"s3://test_bucket/test_key1"]
   
             t = get_list_of_str()
             
             op = S3KeySensor(task_id="s3_key_sensor", bucket_key="{{ 
ti.xcom_pull(task_ids='get_list_of_str') }}")
             
             t >> op
   ```
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to