pvieito opened a new issue, #43634:
URL: https://github.com/apache/airflow/issues/43634

   ### Apache Airflow version
   
   2.10.2
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   We are trying to migrate all our DAGs to use the deferreable operators. I 
set it as the default in out MWAA configuration:
   
   
   operators.default_deferrable | true
   -- | --
   
   This seemed to work properly for a week, but today all jobs started being 
stuck in the DEFERRED status. For example:
   
   Typical run:
   
   ```
   ip-10-5-146-175.eu-west-1.compute.internal
   *** Reading remote log from Cloudwatch log_group: 
airflow-DT-LAB-Airflow-Environment-Task log_stream: 
dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/run_id=scheduled__2024-11-04T08_00_00+00_00/task_id=batch_job/attempt=1.log.
   [2024-11-04, 09:00:15 UTC] {local_task_job_runner.py:123} ▶ Pre task 
execution logs
   [2024-11-04, 09:00:15 UTC] {batch.py:303} INFO - Running AWS Batch job - job 
definition: 
arn:aws:batch:eu-west-1:532132528437:job-definition/DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1:358
 - on queue DT-LAB-BatchExecution-Standard
   [2024-11-04, 09:00:15 UTC] {batch.py:310} INFO - AWS Batch job - container 
overrides: {'environment': [{'name': '_DATATOOLS_DAG_DEPLOYMENT_IDENTIFIER', 
'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': 
'_DATATOOLS_DAG_DEPLOYMENT_TARGET', 'value': 'airflow'}, {'name': 
'DATATOOLS_ENVIRONMENT', 'value': 'LAB'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DAG_ID', 'value': 
'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_ENVIRONMENT_NAME', 'value': 
'shopfully-data-execution'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_START_DATETIME', 'value': 
'2024-11-04 08:00:00+00:00'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_END_DATETIME', 'value': 
'2024-11-04 09:00:00+00:00'}, {'name': 
'_DATATOOLS_PROCESS_EXTRA_ENVIRONMENT_BLOB', 'value': ''}]}
   [2024-11-04, 09:00:15 UTC] {base.py:84} INFO - Retrieving connection 
'aws_shopfully_data_execution'
   [2024-11-04, 09:00:17 UTC] {batch.py:347} INFO - AWS Batch job 
(16ffa872-59c1-4eb9-a7b4-50622364db0b) started: {'ResponseMetadata': 
{'RequestId': '18c85cda-9a0e-4347-8801-ac6e5a1da429', 'HTTPStatusCode': 200, 
'HTTPHeaders': {'date': 'Mon, 04 Nov 2024 09:00:17 GMT', 'content-type': 
'application/json', 'content-length': '198', 'connection': 'keep-alive', 
'x-amzn-requestid': '18c85cda-9a0e-4347-8801-ac6e5a1da429', 
'access-control-allow-origin': '*', 'x-amz-apigw-id': 'Atr9NEPzDoEEWPg=', 
'access-control-expose-headers': 
'X-amzn-errortype,X-amzn-requestid,X-amzn-errormessage,X-amzn-trace-id,X-amz-apigw-id,date',
 'x-amzn-trace-id': 'Root=1-67288d20-10faae0a2da5180b3c952dc2'}, 
'RetryAttempts': 0}, 'jobArn': 
'arn:aws:batch:eu-west-1:532132528437:job/16ffa872-59c1-4eb9-a7b4-50622364db0b',
 'jobName': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1', 'jobId': 
'16ffa872-59c1-4eb9-a7b4-50622364db0b'}
   [2024-11-04, 09:00:17 UTC] {taskinstance.py:288} INFO - Pausing task as 
DEFERRED. dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1, 
task_id=batch_job, run_id=scheduled__2024-11-04T08:00:00+00:00, 
execution_date=20241104T080000, start_date=20241104T090015
   [2024-11-04, 09:00:17 UTC] {taskinstance.py:340} ▶ Post task execution logs
   [2024-11-04, 09:00:18 UTC] {base.py:84} INFO - Retrieving connection 
'aws_shopfully_data_execution'
   [2024-11-04, 09:00:18 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['SUBMITTED']
   [2024-11-04, 09:00:48 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['STARTING']
   [2024-11-04, 09:01:19 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['STARTING']
   [2024-11-04, 09:01:49 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING']
   [2024-11-04, 09:02:19 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING']
   [2024-11-04, 09:02:49 UTC] {waiter_with_logging.py:129} INFO - Batch job 
16ffa872-59c1-4eb9-a7b4-50622364db0b not ready yet: ['RUNNING']
   [2024-11-04, 09:03:19 UTC] {triggerer_job_runner.py:631} INFO - Trigger 
DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/scheduled__2024-11-04T08:00:00+00:00/batch_job/-1/1
 (ID 4448) fired: TriggerEvent<{'status': 'success', 'job_id': 
'16ffa872-59c1-4eb9-a7b4-50622364db0b'}>
   [2024-11-04, 09:03:23 UTC] {local_task_job_runner.py:123} ▶ Pre task 
execution logs
   [2024-11-04, 09:03:23 UTC] {batch.py:290} INFO - Job completed.
   [2024-11-04, 09:03:24 UTC] {taskinstance.py:340} ▶ Post task execution logs
   ```
   
   All runs since today at around 10:00 UTC:
   
   ```
   ip-10-5-146-175.eu-west-1.compute.internal
   *** Reading remote log from Cloudwatch log_group: 
airflow-DT-LAB-Airflow-Environment-Task log_stream: 
dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1/run_id=scheduled__2024-11-04T09_00_00+00_00/task_id=batch_job/attempt=1.log.
   [2024-11-04, 10:00:13 UTC] {local_task_job_runner.py:123} ▶ Pre task 
execution logs
   [2024-11-04, 10:00:14 UTC] {batch.py:303} INFO - Running AWS Batch job - job 
definition: 
arn:aws:batch:eu-west-1:532132528437:job-definition/DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1:358
 - on queue DT-LAB-BatchExecution-Standard
   [2024-11-04, 10:00:14 UTC] {batch.py:310} INFO - AWS Batch job - container 
overrides: {'environment': [{'name': '_DATATOOLS_DAG_DEPLOYMENT_IDENTIFIER', 
'value': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': 
'_DATATOOLS_DAG_DEPLOYMENT_TARGET', 'value': 'airflow'}, {'name': 
'DATATOOLS_ENVIRONMENT', 'value': 'LAB'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DAG_ID', 'value': 
'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_ENVIRONMENT_NAME', 'value': 
'shopfully-data-execution'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_START_DATETIME', 'value': 
'2024-11-04 09:00:00+00:00'}, {'name': 
'_DATATOOLS_AIRFLOW_DEPLOYMENT_DATA_INTERVAL_END_DATETIME', 'value': 
'2024-11-04 10:00:00+00:00'}, {'name': 
'_DATATOOLS_PROCESS_EXTRA_ENVIRONMENT_BLOB', 'value': ''}]}
   [2024-11-04, 10:00:14 UTC] {base.py:84} INFO - Retrieving connection 
'aws_shopfully_data_execution'
   [2024-11-04, 10:00:15 UTC] {batch.py:347} INFO - AWS Batch job 
(c6157b39-4197-4a98-9a0d-2310866a2495) started: {'ResponseMetadata': 
{'RequestId': 'f7570782-3748-4d95-97b6-2fcc29820dd4', 'HTTPStatusCode': 200, 
'HTTPHeaders': {'date': 'Mon, 04 Nov 2024 10:00:15 GMT', 'content-type': 
'application/json', 'content-length': '198', 'connection': 'keep-alive', 
'x-amzn-requestid': 'f7570782-3748-4d95-97b6-2fcc29820dd4', 
'access-control-allow-origin': '*', 'x-amz-apigw-id': 'At0vfHotjoEEbJQ=', 
'access-control-expose-headers': 
'X-amzn-errortype,X-amzn-requestid,X-amzn-errormessage,X-amzn-trace-id,X-amz-apigw-id,date',
 'x-amzn-trace-id': 'Root=1-67289b2f-4f45249a51c82f7f02fdaa9d'}, 
'RetryAttempts': 0}, 'jobArn': 
'arn:aws:batch:eu-west-1:532132528437:job/c6157b39-4197-4a98-9a0d-2310866a2495',
 'jobName': 'DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1', 'jobId': 
'c6157b39-4197-4a98-9a0d-2310866a2495'}
   [2024-11-04, 10:00:15 UTC] {taskinstance.py:288} INFO - Pausing task as 
DEFERRED. dag_id=DT-LAB-DataTools-Schedule-RemoteExecutionCheck-1, 
task_id=batch_job, run_id=scheduled__2024-11-04T09:00:00+00:00, 
execution_date=20241104T090000, start_date=20241104T100013
   [2024-11-04, 10:00:15 UTC] {taskinstance.py:340} ▶ Post task execution logs
   ```
   
   Without more logs from `waiter_with_logging` etc.
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   Use Airflow 2.10 with deferrable operators.
   
   ### Operating System
   
   MWAA
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to