zhaorui2022 opened a new issue, #68585:
URL: https://github.com/apache/airflow/issues/68585

   ### Under which category would you file this issue?
   
   Providers
   
   ### Apache Airflow version
   
   3.2.2, also available in older 3.x versions
   
   ### What happened and how to reproduce it?
   
   1. Set up an environment using s3 as the remote log backend
   2. Run any task (EmptyOperator would also work)
   3. Inspect the open lineage event, and s3 log file URIs are added as task 
outputs
   
   ### What you think should happen instead?
   
   S3 logs are not task outputs, and they should not be added as task outputs 
in openlineage events. 
   
   The main cause is probably:
   1. S3Hook is used to upload logs to s3 
https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/log/s3_task_handler.py#L136
   2. Inside S3Hook, load_string always emits openlineage events, in this case, 
task logs 
https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/hooks/s3.py#L1248-L1250
   
   ### Operating System
   
   _No response_
   
   ### Deployment
   
   None
   
   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   9.29.0, available in earlier versions as well
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   Using the official 3.2.2 docker image
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to