0x26res commented on issue #40435:
URL: https://github.com/apache/airflow/issues/40435#issuecomment-2273948381
Job of a failed task:
```
[2024-08-03, 00:46:55 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 00:46:56 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 00:46:56 UTC] {{taskinstance.py:2170}} INFO - Starting attempt
1 of 1
[2024-08-03, 00:46:56 UTC] {{taskinstance.py:2191}} INFO - Executing
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
[2024-08-03, 00:46:56 UTC] {{standard_task_runner.py:60}} INFO - Started
process 26709 to run task
[2024-08-03, 00:46:57 UTC] {{taskinstance.py:2480}} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='eod_jobs'
AIRFLOW_CTX_TASK_ID='task_123'
AIRFLOW_CTX_EXECUTION_DATE='2024-08-02T00:15:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-08-02T00:15:00+00:00'
[2024-08-03, 00:46:57 UTC] {{batch.py:274}} INFO - Running AWS Batch job -
job definition: some_job_definition - on queue some-queue
[2024-08-03, 00:46:57 UTC] {{batch.py:281}} INFO - AWS Batch job - container
overrides: {'command': ['bash', '-c', 'python -m some_module.batch.task_123
--date=2024-08-02'], 'environment': []}
[2024-08-03, 00:46:57 UTC] {{base_aws.py:161}} INFO - No connection ID
provided. Fallback on boto3 credential strategy (region_name=None). See:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2024-08-03, 00:46:59 UTC] {{batch.py:312}} INFO - AWS Batch job (JOB_UUID)
started: REDACTED
[2024-08-03, 00:47:01 UTC] {{taskinstance.py:2344}} INFO - Pausing task as
DEFERRED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500,
start_date=20240803T004656
[2024-08-03, 00:47:02 UTC] {{base_aws.py:161}} INFO - No connection ID
provided. Fallback on boto3 credential strategy (region_name=None). See:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2024-08-03, 00:47:07 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['STARTING']
[2024-08-03, 00:47:22 UTC] {{local_task_job_runner.py:302}} WARNING - State
of this instance has been externally set to deferred. Terminating instance.
[2024-08-03, 00:47:23 UTC] {{process_utils.py:131}} INFO - Sending 15 to
group 26709. PIDs of all processes in the group: [26709]
[2024-08-03, 00:47:24 UTC] {{process_utils.py:86}} INFO - Sending the signal
15 to group 26709
[2024-08-03, 00:47:37 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['STARTING']
[2024-08-03, 00:47:44 UTC] {{process_utils.py:79}} INFO - Process
psutil.Process(pid=26709, status='terminated', exitcode=100,
started='00:46:56') (26709) terminated with exit code 100
[2024-08-03, 00:48:07 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['STARTING']
[2024-08-03, 00:48:37 UTC] {{waiter_with_logging.py:126}} ERROR - Failure
while running batch job JOB_UUID: ['FAILED']
[2024-08-03, 00:56:04 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 00:56:04 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 00:56:04 UTC] {{taskinstance.py:2168}} INFO - Resuming after
deferral
[2024-08-03, 00:56:04 UTC] {{taskinstance.py:2191}} INFO - Executing
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
[2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:60}} INFO - Started
process 27106 to run task
[2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:87}} INFO - Running:
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123',
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '36971', '--raw',
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpf1bgxn6c']
[2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:88}} INFO - Job 36971:
Subtask task_123
[2024-08-03, 00:56:05 UTC] {{baseoperator.py:1598}} ERROR - Trigger failed:
Traceback (most recent call last):
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/utils/waiter_with_logging.py",
line 123, in async_wait
await waiter.wait(**args, WaiterConfig={"MaxAttempts": 1})
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/aiobotocore/waiter.py",
line 49, in wait
return await AIOWaiter.wait(self, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/aiobotocore/waiter.py",
line 126, in wait
raise WaiterError(
botocore.exceptions.WaiterError: Waiter batch_job_complete failed: Waiter
encountered a terminal failure state: For expression "jobs[].status" all
members matched excepted path: "FAILED"
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py",
line 529, in cleanup_finished_triggers
result = details["task"].result()
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py",
line 601, in run_trigger
async for event in trigger.run():
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/triggers/base.py",
line 143, in run
await async_wait(
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/utils/waiter_with_logging.py",
line 127, in async_wait
raise AirflowException(f"{failure_message}: {error}")
airflow.exceptions.AirflowException: Failure while running batch job
JOB_UUID: Waiter batch_job_complete failed: Waiter encountered a terminal
failure state: For expression "jobs[].status" all members matched excepted
path: "FAILED"
[2024-08-03, 00:56:05 UTC] {{taskinstance.py:2698}} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 433, in _execute_task
result = execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py",
line 1599, in resume_execution
raise TaskDeferralError(next_kwargs.get("error", "Unknown"))
airflow.exceptions.TaskDeferralError: Trigger failure
[2024-08-03, 00:56:05 UTC] {{taskinstance.py:1138}} INFO - Marking task as
FAILED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500,
start_date=20240803T004656, end_date=20240803T005605
[2024-08-03, 00:56:05 UTC] {{logging_mixin.py:188}} WARNING -
/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/slack/hooks/slack_webhook.py:42
UserWarning: You cannot override the default channel (chosen by the user who
installed your app), username, or icon when you're using Incoming Webhooks to
post messages. Instead, these values will always inherit from the associated
Slack app configuration. See:
https://api.slack.com/messaging/webhooks#advanced_message_formatting. It is
possible to change this values only in Legacy Slack Integration Incoming
Webhook:
https://api.slack.com/legacy/custom-integrations/messaging/webhooks#legacy-customizations
[2024-08-03, 00:56:06 UTC] {{standard_task_runner.py:107}} ERROR - Failed to
execute job 36971 for task task_123 (Trigger failure; 27106)
[2024-08-03, 00:56:07 UTC] {{local_task_job_runner.py:234}} INFO - Task
exited with return code 1
[2024-08-03, 00:56:07 UTC] {{taskinstance.py:3280}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
```
Logs of a good task (same task after retry):
```
[2024-08-03, 10:35:19 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 10:35:20 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 10:35:20 UTC] {{taskinstance.py:2170}} INFO - Starting attempt
2 of 2
[2024-08-03, 10:35:20 UTC] {{taskinstance.py:2191}} INFO - Executing
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
[2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:60}} INFO - Started
process 3875 to run task
[2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:87}} INFO - Running:
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123',
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '37770', '--raw',
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpev95ge4j']
[2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:88}} INFO - Job 37770:
Subtask task_123
[2024-08-03, 10:35:20 UTC] {{taskinstance.py:2480}} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='eod_jobs'
AIRFLOW_CTX_TASK_ID='task_123'
AIRFLOW_CTX_EXECUTION_DATE='2024-08-02T00:15:00+00:00'
AIRFLOW_CTX_TRY_NUMBER='2'
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-08-02T00:15:00+00:00'
[2024-08-03, 10:35:20 UTC] {{batch.py:274}} INFO - Running AWS Batch job -
job definition: some_job_definition - on queue some-queue
[2024-08-03, 10:35:20 UTC] {{batch.py:281}} INFO - AWS Batch job - container
overrides: {'command': ['bash', '-c', 'python -m some_module.batch.task_123
--date=2024-08-02'], 'environment': []}
[2024-08-03, 10:35:20 UTC] {{base_aws.py:161}} INFO - No connection ID
provided. Fallback on boto3 credential strategy (region_name=None). See:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2024-08-03, 10:35:21 UTC] {{batch.py:312}} INFO - AWS Batch job (JOB_UUID)
started: REDACTED
[2024-08-03, 10:35:21 UTC] {{taskinstance.py:2344}} INFO - Pausing task as
DEFERRED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500,
start_date=20240803T103519
[2024-08-03, 10:35:21 UTC] {{local_task_job_runner.py:231}} INFO - Task
exited with return code 100 (task deferral)
[2024-08-03, 10:35:22 UTC] {{base_aws.py:161}} INFO - No connection ID
provided. Fallback on boto3 credential strategy (region_name=None). See:
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2024-08-03, 10:35:22 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['RUNNABLE']
[2024-08-03, 10:35:53 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['STARTING']
[2024-08-03, 10:36:23 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['STARTING']
[2024-08-03, 10:36:53 UTC] {{waiter_with_logging.py:129}} INFO - Batch job
JOB_UUID not ready yet: ['RUNNING']
[2024-08-03, 10:37:23 UTC] {{triggerer_job_runner.py:602}} INFO - Trigger
eod_jobs/scheduled__2024-08-02T00:15:00+00:00/task_123/-1/2 (ID 5992) fired:
TriggerEvent<{'status': 'success', 'job_id': 'JOB_UUID'}>
[2024-08-03, 10:37:28 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 10:37:28 UTC] {{taskinstance.py:1956}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123
scheduled__2024-08-02T00:15:00+00:00 [queued]>
[2024-08-03, 10:37:28 UTC] {{taskinstance.py:2168}} INFO - Resuming after
deferral
[2024-08-03, 10:37:28 UTC] {{taskinstance.py:2191}} INFO - Executing
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
[2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:60}} INFO - Started
process 3913 to run task
[2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:87}} INFO - Running:
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123',
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '37775', '--raw',
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpzry41zjk']
[2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:88}} INFO - Job 37775:
Subtask task_123
[2024-08-03, 10:37:29 UTC] {{batch.py:262}} INFO - Job completed.
[2024-08-03, 10:37:29 UTC] {{taskinstance.py:1138}} INFO - Marking task as
SUCCESS. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500,
start_date=20240803T103519, end_date=20240803T103729
[2024-08-03, 10:37:29 UTC] {{local_task_job_runner.py:234}} INFO - Task
exited with return code 0
[2024-08-03, 10:37:29 UTC] {{taskinstance.py:3280}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]