0x26res commented on issue #40435:
URL: https://github.com/apache/airflow/issues/40435#issuecomment-2273948381

   Job of a failed task:
   
   ```
   [2024-08-03, 00:46:55 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 00:46:56 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 00:46:56 UTC] {{taskinstance.py:2170}} INFO - Starting attempt 
1 of 1
   [2024-08-03, 00:46:56 UTC] {{taskinstance.py:2191}} INFO - Executing 
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
   [2024-08-03, 00:46:56 UTC] {{standard_task_runner.py:60}} INFO - Started 
process 26709 to run task
   [2024-08-03, 00:46:57 UTC] {{taskinstance.py:2480}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='eod_jobs' 
AIRFLOW_CTX_TASK_ID='task_123' 
AIRFLOW_CTX_EXECUTION_DATE='2024-08-02T00:15:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-08-02T00:15:00+00:00'
   [2024-08-03, 00:46:57 UTC] {{batch.py:274}} INFO - Running AWS Batch job - 
job definition: some_job_definition - on queue some-queue
   [2024-08-03, 00:46:57 UTC] {{batch.py:281}} INFO - AWS Batch job - container 
overrides: {'command': ['bash', '-c', 'python -m some_module.batch.task_123  
--date=2024-08-02'], 'environment': []}
   [2024-08-03, 00:46:57 UTC] {{base_aws.py:161}} INFO - No connection ID 
provided. Fallback on boto3 credential strategy (region_name=None). See: 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   [2024-08-03, 00:46:59 UTC] {{batch.py:312}} INFO - AWS Batch job (JOB_UUID) 
started: REDACTED
   [2024-08-03, 00:47:01 UTC] {{taskinstance.py:2344}} INFO - Pausing task as 
DEFERRED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500, 
start_date=20240803T004656
   [2024-08-03, 00:47:02 UTC] {{base_aws.py:161}} INFO - No connection ID 
provided. Fallback on boto3 credential strategy (region_name=None). See: 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   [2024-08-03, 00:47:07 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['STARTING']
   [2024-08-03, 00:47:22 UTC] {{local_task_job_runner.py:302}} WARNING - State 
of this instance has been externally set to deferred. Terminating instance.
   [2024-08-03, 00:47:23 UTC] {{process_utils.py:131}} INFO - Sending 15 to 
group 26709. PIDs of all processes in the group: [26709]
   [2024-08-03, 00:47:24 UTC] {{process_utils.py:86}} INFO - Sending the signal 
15 to group 26709
   [2024-08-03, 00:47:37 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['STARTING']
   [2024-08-03, 00:47:44 UTC] {{process_utils.py:79}} INFO - Process 
psutil.Process(pid=26709, status='terminated', exitcode=100, 
started='00:46:56') (26709) terminated with exit code 100
   [2024-08-03, 00:48:07 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['STARTING']
   [2024-08-03, 00:48:37 UTC] {{waiter_with_logging.py:126}} ERROR - Failure 
while running batch job JOB_UUID: ['FAILED']
   [2024-08-03, 00:56:04 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 00:56:04 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 00:56:04 UTC] {{taskinstance.py:2168}} INFO - Resuming after 
deferral
   [2024-08-03, 00:56:04 UTC] {{taskinstance.py:2191}} INFO - Executing 
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
   [2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:60}} INFO - Started 
process 27106 to run task
   [2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:87}} INFO - Running: 
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123', 
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '36971', '--raw', 
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpf1bgxn6c']
   [2024-08-03, 00:56:04 UTC] {{standard_task_runner.py:88}} INFO - Job 36971: 
Subtask task_123
   [2024-08-03, 00:56:05 UTC] {{baseoperator.py:1598}} ERROR - Trigger failed:
   Traceback (most recent call last):
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/utils/waiter_with_logging.py",
 line 123, in async_wait
       await waiter.wait(**args, WaiterConfig={"MaxAttempts": 1})
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/aiobotocore/waiter.py", 
line 49, in wait
       return await AIOWaiter.wait(self, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/aiobotocore/waiter.py", 
line 126, in wait
       raise WaiterError(
   
   botocore.exceptions.WaiterError: Waiter batch_job_complete failed: Waiter 
encountered a terminal failure state: For expression "jobs[].status" all 
members matched excepted path: "FAILED"
   
   
   During handling of the above exception, another exception occurred:
   
   
   Traceback (most recent call last):
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py",
 line 529, in cleanup_finished_triggers
       result = details["task"].result()
                ^^^^^^^^^^^^^^^^^^^^^^^^
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/jobs/triggerer_job_runner.py",
 line 601, in run_trigger
       async for event in trigger.run():
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/triggers/base.py",
 line 143, in run
       await async_wait(
   
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/utils/waiter_with_logging.py",
 line 127, in async_wait
       raise AirflowException(f"{failure_message}: {error}")
   
   airflow.exceptions.AirflowException: Failure while running batch job 
JOB_UUID: Waiter batch_job_complete failed: Waiter encountered a terminal 
failure state: For expression "jobs[].status" all members matched excepted 
path: "FAILED"
   
   [2024-08-03, 00:56:05 UTC] {{taskinstance.py:2698}} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 433, in _execute_task
       result = execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/baseoperator.py",
 line 1599, in resume_execution
       raise TaskDeferralError(next_kwargs.get("error", "Unknown"))
   airflow.exceptions.TaskDeferralError: Trigger failure
   [2024-08-03, 00:56:05 UTC] {{taskinstance.py:1138}} INFO - Marking task as 
FAILED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500, 
start_date=20240803T004656, end_date=20240803T005605
   [2024-08-03, 00:56:05 UTC] {{logging_mixin.py:188}} WARNING - 
/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/slack/hooks/slack_webhook.py:42
 UserWarning: You cannot override the default channel (chosen by the user who 
installed your app), username, or icon when you're using Incoming Webhooks to 
post messages. Instead, these values will always inherit from the associated 
Slack app configuration. See: 
https://api.slack.com/messaging/webhooks#advanced_message_formatting. It is 
possible to change this values only in Legacy Slack Integration Incoming 
Webhook: 
https://api.slack.com/legacy/custom-integrations/messaging/webhooks#legacy-customizations
   [2024-08-03, 00:56:06 UTC] {{standard_task_runner.py:107}} ERROR - Failed to 
execute job 36971 for task task_123 (Trigger failure; 27106)
   [2024-08-03, 00:56:07 UTC] {{local_task_job_runner.py:234}} INFO - Task 
exited with return code 1
   [2024-08-03, 00:56:07 UTC] {{taskinstance.py:3280}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   Logs of a good task (same task after retry):
   
   ```
   [2024-08-03, 10:35:19 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 10:35:20 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 10:35:20 UTC] {{taskinstance.py:2170}} INFO - Starting attempt 
2 of 2
   [2024-08-03, 10:35:20 UTC] {{taskinstance.py:2191}} INFO - Executing 
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
   [2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:60}} INFO - Started 
process 3875 to run task
   [2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:87}} INFO - Running: 
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123', 
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '37770', '--raw', 
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpev95ge4j']
   [2024-08-03, 10:35:20 UTC] {{standard_task_runner.py:88}} INFO - Job 37770: 
Subtask task_123
   [2024-08-03, 10:35:20 UTC] {{taskinstance.py:2480}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='eod_jobs' 
AIRFLOW_CTX_TASK_ID='task_123' 
AIRFLOW_CTX_EXECUTION_DATE='2024-08-02T00:15:00+00:00' 
AIRFLOW_CTX_TRY_NUMBER='2' 
AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-08-02T00:15:00+00:00'
   [2024-08-03, 10:35:20 UTC] {{batch.py:274}} INFO - Running AWS Batch job - 
job definition: some_job_definition - on queue some-queue
   [2024-08-03, 10:35:20 UTC] {{batch.py:281}} INFO - AWS Batch job - container 
overrides: {'command': ['bash', '-c', 'python -m some_module.batch.task_123  
--date=2024-08-02'], 'environment': []}
   [2024-08-03, 10:35:20 UTC] {{base_aws.py:161}} INFO - No connection ID 
provided. Fallback on boto3 credential strategy (region_name=None). See: 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   [2024-08-03, 10:35:21 UTC] {{batch.py:312}} INFO - AWS Batch job (JOB_UUID) 
started: REDACTED
   [2024-08-03, 10:35:21 UTC] {{taskinstance.py:2344}} INFO - Pausing task as 
DEFERRED. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500, 
start_date=20240803T103519
   [2024-08-03, 10:35:21 UTC] {{local_task_job_runner.py:231}} INFO - Task 
exited with return code 100 (task deferral)
   [2024-08-03, 10:35:22 UTC] {{base_aws.py:161}} INFO - No connection ID 
provided. Fallback on boto3 credential strategy (region_name=None). See: 
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
   [2024-08-03, 10:35:22 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['RUNNABLE']
   [2024-08-03, 10:35:53 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['STARTING']
   [2024-08-03, 10:36:23 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['STARTING']
   [2024-08-03, 10:36:53 UTC] {{waiter_with_logging.py:129}} INFO - Batch job 
JOB_UUID not ready yet: ['RUNNING']
   [2024-08-03, 10:37:23 UTC] {{triggerer_job_runner.py:602}} INFO - Trigger 
eod_jobs/scheduled__2024-08-02T00:15:00+00:00/task_123/-1/2 (ID 5992) fired: 
TriggerEvent<{'status': 'success', 'job_id': 'JOB_UUID'}>
   [2024-08-03, 10:37:28 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 10:37:28 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: eod_jobs.task_123 
scheduled__2024-08-02T00:15:00+00:00 [queued]>
   [2024-08-03, 10:37:28 UTC] {{taskinstance.py:2168}} INFO - Resuming after 
deferral
   [2024-08-03, 10:37:28 UTC] {{taskinstance.py:2191}} INFO - Executing 
<Task(BatchOperator): task_123> on 2024-08-02 00:15:00+00:00
   [2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:60}} INFO - Started 
process 3913 to run task
   [2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:87}} INFO - Running: 
['airflow', 'tasks', 'run', 'eod_jobs', 'task_123', 
'scheduled__2024-08-02T00:15:00+00:00', '--job-id', '37775', '--raw', 
'--subdir', 'DAGS_FOLDER/eod_jobs.py', '--cfg-path', '/tmp/tmpzry41zjk']
   [2024-08-03, 10:37:28 UTC] {{standard_task_runner.py:88}} INFO - Job 37775: 
Subtask task_123
   [2024-08-03, 10:37:29 UTC] {{batch.py:262}} INFO - Job completed.
   [2024-08-03, 10:37:29 UTC] {{taskinstance.py:1138}} INFO - Marking task as 
SUCCESS. dag_id=eod_jobs, task_id=task_123, execution_date=20240802T001500, 
start_date=20240803T103519, end_date=20240803T103729
   [2024-08-03, 10:37:29 UTC] {{local_task_job_runner.py:234}} INFO - Task 
exited with return code 0
   [2024-08-03, 10:37:29 UTC] {{taskinstance.py:3280}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to