pvieito opened a new issue, #43634: URL: https://github.com/apache/airflow/issues/43634
### Apache Airflow version 2.10.2 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? We are trying to migrate all our DAGs to use the deferreable operators. I set it as the default in out MWAA configuration: operators.default_deferrable | true -- | -- This seemed to work properly for a week, but today all jobs started being stuck in the DEFERRED status. For example: Typical run: ``` ip-10-5-146-175.eu-west-1.compute.internal *** Reading remote log from Cloudwatch log_group: airflow-DT-LAB-Airflow-Environment-Task log_stream: dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/run_id=scheduled__2024-11-04T08_00_00+00_00/task_id=batch_job/attempt=1.log. [2024-11-04, 09:00:15 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2024-11-04, 09:00:15 UTC] {batch.py:303} INFO - Running AWS Batch job - job definition: arn:aws:batch:eu-west-1:532132528437:job-definition/DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1:358 - on queue DT-LAB-BatchExecution-Standard [2024-11-04, 09:00:15 UTC] {batch.py:310} INFO - AWS Batch job - container overrides: {'environment': [{'name': '_DATATOOLS_DAG_DEPLOYMENT_IDENTIFIER', 'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': '_DATATOOLS_DAG_DEPLOYMENT_TARGET', 'value': 'airflow'}, {'name': 'DATATOOLS_ENVIRONMENT', 'value': 'LAB'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DAG_ID', 'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_ENVIRONMENT_NAME', 'value': 'shopfully-data-execution'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_START_DATETIME', 'value': '2024-11-04 08:00:00+00:00'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_END_DATETIME', 'value': '2024-11-04 09:00:00+00:00'}, {'name': '_DATATOOLS_PROCESS_EXTRA_ENVIRONMENT_BLOB', 'value': ''}]} [2024-11-04, 09:00:15 UTC] {base.py:84} INFO - Retrieving connection 'aws_shopfully_data_execution' [2024-11-04, 09:00:17 UTC] {batch.py:347} INFO - AWS Batch job (16ffa872-59c1-4eb9-a7b4-50622364db0b) started: {'ResponseMetadata': {'RequestId': '18c85cda-9a0e-4347-8801-ac6e5a1da429', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 04 Nov 2024 09:00:17 GMT', 'content-type': 'application/json', 'content-length': '198', 'connection': 'keep-alive', 'x-amzn-requestid': '18c85cda-9a0e-4347-8801-ac6e5a1da429', 'access-control-allow-origin': '*', 'x-amz-apigw-id': 'Atr9NEPzDoEEWPg=', 'access-control-expose-headers': 'X-amzn-errortype,X-amzn-requestid,X-amzn-errormessage,X-amzn-trace-id,X-amz-apigw-id,date', 'x-amzn-trace-id': 'Root=1-67288d20-10faae0a2da5180b3c952dc2'}, 'RetryAttempts': 0}, 'jobArn': 'arn:aws:batch:eu-west-1:532132528437:job/16ffa872-59c1-4eb9-a7b4-50622364db0b', 'jobName': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1', 'jobId': '16ffa872-59c1-4eb9-a7b4-50622364db0b'} [2024-11-04, 09:00:17 UTC] {taskinstance.py:288} INFO - Pausing task as DEFERRED. dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1, task_id=batch_job, run_id=scheduled__2024-11-04T08:00:00+00:00, execution_date=20241104T080000, start_date=20241104T090015 [2024-11-04, 09:00:17 UTC] {taskinstance.py:340} ▶ Post task execution logs [2024-11-04, 09:00:18 UTC] {base.py:84} INFO - Retrieving connection 'aws_shopfully_data_execution' [2024-11-04, 09:00:18 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['SUBMITTED'] [2024-11-04, 09:00:48 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['STARTING'] [2024-11-04, 09:01:19 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['STARTING'] [2024-11-04, 09:01:49 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING'] [2024-11-04, 09:02:19 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING'] [2024-11-04, 09:02:49 UTC] {waiter_with_logging.py:129} INFO - Batch job 16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING'] [2024-11-04, 09:03:19 UTC] {triggerer_job_runner.py:631} INFO - Trigger DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/scheduled__2024-11-04T08:00:00+00:00/batch_job/-1/1 (ID 4448) fired: TriggerEvent<{'status': 'success', 'job_id': '16ffa872-59c1-4eb9-a7b4-50622364db0b'}> [2024-11-04, 09:03:23 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2024-11-04, 09:03:23 UTC] {batch.py:290} INFO - Job completed. [2024-11-04, 09:03:24 UTC] {taskinstance.py:340} ▶ Post task execution logs ``` All runs since today at around 10:00 UTC: ``` ip-10-5-146-175.eu-west-1.compute.internal *** Reading remote log from Cloudwatch log_group: airflow-DT-LAB-Airflow-Environment-Task log_stream: dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/run_id=scheduled__2024-11-04T09_00_00+00_00/task_id=batch_job/attempt=1.log. [2024-11-04, 10:00:13 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs [2024-11-04, 10:00:14 UTC] {batch.py:303} INFO - Running AWS Batch job - job definition: arn:aws:batch:eu-west-1:532132528437:job-definition/DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1:358 - on queue DT-LAB-BatchExecution-Standard [2024-11-04, 10:00:14 UTC] {batch.py:310} INFO - AWS Batch job - container overrides: {'environment': [{'name': '_DATATOOLS_DAG_DEPLOYMENT_IDENTIFIER', 'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': '_DATATOOLS_DAG_DEPLOYMENT_TARGET', 'value': 'airflow'}, {'name': 'DATATOOLS_ENVIRONMENT', 'value': 'LAB'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DAG_ID', 'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_ENVIRONMENT_NAME', 'value': 'shopfully-data-execution'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_START_DATETIME', 'value': '2024-11-04 09:00:00+00:00'}, {'name': '_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_END_DATETIME', 'value': '2024-11-04 10:00:00+00:00'}, {'name': '_DATATOOLS_PROCESS_EXTRA_ENVIRONMENT_BLOB', 'value': ''}]} [2024-11-04, 10:00:14 UTC] {base.py:84} INFO - Retrieving connection 'aws_shopfully_data_execution' [2024-11-04, 10:00:15 UTC] {batch.py:347} INFO - AWS Batch job (c6157b39-4197-4a98-9a0d-2310866a2495) started: {'ResponseMetadata': {'RequestId': 'f7570782-3748-4d95-97b6-2fcc29820dd4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 04 Nov 2024 10:00:15 GMT', 'content-type': 'application/json', 'content-length': '198', 'connection': 'keep-alive', 'x-amzn-requestid': 'f7570782-3748-4d95-97b6-2fcc29820dd4', 'access-control-allow-origin': '*', 'x-amz-apigw-id': 'At0vfHotjoEEbJQ=', 'access-control-expose-headers': 'X-amzn-errortype,X-amzn-requestid,X-amzn-errormessage,X-amzn-trace-id,X-amz-apigw-id,date', 'x-amzn-trace-id': 'Root=1-67289b2f-4f45249a51c82f7f02fdaa9d'}, 'RetryAttempts': 0}, 'jobArn': 'arn:aws:batch:eu-west-1:532132528437:job/c6157b39-4197-4a98-9a0d-2310866a2495', 'jobName': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1', 'jobId': 'c6157b39-4197-4a98-9a0d-2310866a2495'} [2024-11-04, 10:00:15 UTC] {taskinstance.py:288} INFO - Pausing task as DEFERRED. dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1, task_id=batch_job, run_id=scheduled__2024-11-04T09:00:00+00:00, execution_date=20241104T090000, start_date=20241104T100013 [2024-11-04, 10:00:15 UTC] {taskinstance.py:340} ▶ Post task execution logs ``` Without more logs from `waiter_with_logging` etc. ### What you think should happen instead? _No response_ ### How to reproduce Use Airflow 2.10 with deferrable operators. ### Operating System MWAA ### Versions of Apache Airflow Providers _No response_ ### Deployment Amazon (AWS) MWAA ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org