kacpermuda opened a new issue, #40598:
URL: https://github.com/apache/airflow/issues/40598

   ### Apache Airflow Provider(s)
   
   amazon, openlineage
   
   ### Versions of Apache Airflow Providers
   
   Tested on both: main branch and latest pypi versions:
   apache-airflow-providers-openlineage==1.8.0
   apache-airflow-providers-amazon==8.25.0
   
   ### Apache Airflow version
   
   main branch and 2.9.2
   
   ### Operating System
   
   MacOS 14.5
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   astro-runtime:11.6.0 (tested with astro dev and actual deployment, they 
should be the same but still verified it)
   
   Also tested with breeze on main branch
   
   ### What happened
   
   When running simple DAG with AthenaOperator, I am sometimes not receiving 
OpenLineage DAG complete events. It looks flaky at first, and I've not yet 
figured it out. There is nothing suspicious in scheduler logs for me. 
   
   I'm not sure how the choice of the task (AthenaOperator) can influence lack 
of DAG complete event that is emitted from the scheduler, so it's possible that 
I did something wrong here. Let me know if you are able to reproduce this 
behaviour.
   
   ### What you think should happen instead
   
   We should always receive OpenLineage DAG complete events.
   
   ### How to reproduce
   
   I've run this DAG a couple times, and did not receive DAG complete events.
   
   On astro, this is my dockerfile:
   ```
   FROM quay.io/astronomer/astro-runtime:11.6.0
   ENV AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "console"}'
   ENV AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG
   ```
   
   <details>
   <summary>Example DAG</summary> 
   
   ```
   from airflow.providers.amazon.aws.operators.athena import AthenaOperator
   from airflow import DAG
   import datetime as dt
   import random
   import string
   
   suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=3))
   table_name = f"t_{suffix}"
   
   query = f"""
   CREATE TABLE {table_name} AS
   SELECT
       UPPER(name) AS x,
       age * 2 AS y
   FROM
       workers_csv;
   """
   
   with DAG(
       dag_id='athena',
       start_date=dt.datetime(2024, 5, 21),
       schedule=None
   ) as dag:
       task = AthenaOperator(
           task_id="task",
           aws_conn_id="aws",
           query=query,
           database="default",
           output_location="s3://<bucket-name>/results",
           deferrable=False,
           region="eu-central-1",
       )
   ```
   
   </details>
   
   ### Anything else
   
   Scheduler logs from astro dev: 
   
[astro_scheduler_logs.txt](https://github.com/user-attachments/files/16095130/astro_scheduler_logs.txt)
   
   Scheduler and task logs from breeze:
   
[breeze_scheduler_logs.txt](https://github.com/user-attachments/files/16095233/breeze_scheduler_logs.txt)
   
[breeze_task_logs.log](https://github.com/user-attachments/files/16095234/breeze_task_logs.log)
   
   Marquez events:
   
![marquez_events](https://github.com/apache/airflow/assets/98873404/0ad66d3e-5e39-49dc-9e69-90dfec3776a0)
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to