mgorsk1 commented on issue #17238:
URL: https://github.com/apache/airflow/issues/17238#issuecomment-895140926


   Ok I think we've more-less figured what was the reason. I am curious to know 
what do you think about our findings:
   1. We had a DAGs code containing `dagrun_timeout=timedelta(minutes=60)`. 
This code was running on prod for 2 years with Airflow `1.10+` 
   2. According to docs 
(https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/index.html)
 this configuration option should not take effect if DAG has `schedule=None` 
which was the case for all of our DAGs. 
   ```
   dagrun_timeout (datetime.timedelta) – specify how long a DagRun should be up 
before timing out / failing, so that new DagRuns can be created. **The timeout 
is only enforced for scheduled DagRuns.**
   ```
   3. After migrating to Airflow 2 our DAGs, which previously took more than 1 
hour has noted significant improvement in DAG run so it became highly unlikely 
to reach 60 minutes of a timeout, but whenever it happened indeed the tasks 
were marked as skipped and subsequently as failed.
   Below code shows this inconsistency between docs and actual behavior 
(timeout should not happened but actually tasks are killed mid-run, ):
   ```
   import time
   from datetime import datetime, timedelta
   
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from airflow.operators.python import PythonOperator
   
   default_args = {"start_date": datetime(2021, 7, 30)}
   
   with DAG("dagrun_timeout_error", default_args=default_args,
            schedule_interval=None,
            dagrun_timeout=timedelta(seconds=60)) as dag:
       start = DummyOperator(task_id="start")
       end = DummyOperator(task_id="end")
       for i in range(5):
           prev = start
           for j in range(3):
               t = PythonOperator(
                   task_id=f"t-{i}-{j}", python_callable=lambda: time.sleep(120)
               )
               prev >> t
               prev = t
           t >> end
   ```
   So I understand it's either change of behavior that wasn't documented 
properly or it's a bug with undesired behavior. Let me know WDYT.
   
   Thanks to @dechoma for debugging this together.
   
   cc @jedcunningham @ephraimbuddy  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to