anas-ismail opened a new issue, #35431:
URL: https://github.com/apache/airflow/issues/35431
### Apache Airflow version
Other Airflow 2 version (please specify below)
### What happened
With `apache-airflow-providers-amazon` **8.2.0** and **8.10.0**,
**GlueJobSensor** with `verbose` option set to **True** gets stuck in the
`running` state until it times out even though underlying Glue jobs (Glue 3.0)
are reaching the terminal state (FAILED or SUCCEEDED).
It works as expected when `verbose` option is set to **False** for
`apache-airflow-providers-amazon` **8.2.0** and **8.10.0**.
Also, it works as expected even when `verbose` option is set to **True**
when I pin **7.4.1** version of `apache-airflow-providers-amazon`.
### What you think should happen instead
1. **GlueJobSensor** should FAIL or SUCCEED when underlying Glue Job reaches
its terminal state with `verbose` option set to **True** .
2. **GlueJobSensor** should log Glue Job logs with `verbose` option set to
**True**
### How to reproduce
1. Create a DAG with Glue Job Operator and Glue Job Sensor running in
sequence.
2. Glue Job Sensor should have **verbose** set to **True**.
**DAG example:**
```
run_glue_job = GlueJobOperator(
task_id='run_glue_job',
job_name="{{ ti.xcom_pull(task_ids='get_glue_job_arguments',
key='jobname') }}",
wait_for_completion=False,
script_args="{{ ti.xcom_pull(task_ids='get_glue_job_arguments',
key='script_args') }}"
)
wait_on_glue_job = GlueJobSensor(
task_id='wait_on_glue_job',
job_name="{{ dag_run.conf.jobname }}",
run_id="{{ ti.xcom_pull(task_ids='run_glue_job', key='return_value')
}}",
mode='reschedule',
timeout=DEFAULT_GLUE_WAIT_TIMEOUT,
verbose=True
)
run_glue_job >> wait_on_glue_job
```
### Operating System
N/A
### Versions of Apache Airflow Providers
`apache-airflow-providers-amazon` Versions with the problem: **8.2.0** and
**8.10.0**
`apache-airflow-providers-amazon` Version working fine: **7.4.1**
### Deployment
Amazon (AWS) MWAA
### Deployment details
MWAA version 2.6.3
### Anything else
It happens every time.
I am attaching the Task logs from both problematic and non-problematic
provider versions:
### Logs from apache-airflow-providers-amazon 8.2.0
```
*** Reading remote log from Cloudwatch log_group:
airflow-rrs-alpha-mwaa-Task log_stream:
dag_id=sqlify_executor/run_id=manual__2023-11-03T18_06_14.935010+00_00/task_id=wait_on_glue_job/attempt=1.log.
[2023-11-03, 18:07:22 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00
[queued]>
[2023-11-03, 18:07:22 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance:
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00
[queued]>
[2023-11-03, 18:07:22 UTC] {{taskinstance.py:1308}} INFO - Starting attempt
1 of 1
[2023-11-03, 18:07:22 UTC] {{taskinstance.py:1327}} INFO - Executing
<Task(GlueJobSensor): wait_on_glue_job> on 2023-11-03 18:06:14.935010+00:00
[2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:57}} INFO - Started
process 387 to run task
[2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:84}} INFO - Running:
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job',
'manual__2023-11-03T18:06:14.935010+00:00', '--job-id', '2446', '--raw',
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpvjto6uxf']
[2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:85}} INFO - Job 2446:
Subtask wait_on_glue_job
[2023-11-03, 18:07:22 UTC] {{task_command.py:410}} INFO - Running
<TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-11-03T18:06:14.935010+00:00 [running]> on host
ip-10-0-139-229.ec2.internal
[2023-11-03, 18:07:22 UTC] {{taskinstance.py:1545}} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sqlify_executor'
AIRFLOW_CTX_TASK_ID='wait_on_glue_job'
AIRFLOW_CTX_EXECUTION_DATE='2023-11-03T18:06:14.935010+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-03T18:06:14.935010+00:00'
[2023-11-03, 18:07:22 UTC] {{glue.py:71}} INFO - Poking for job run status
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID
jr_90de7aa69fd8dac7db96b6839b2d4142e9aa89c20456785d494ed15a68da992f
[2023-11-03, 18:07:22 UTC] {{base.py:73}} INFO - Using connection ID
'aws_default' for task execution.
[2023-11-03, 18:07:23 UTC] {{glue.py:247}} WARNING - No new Glue driver logs
so far.
If this persists, check the CloudWatch dashboard at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home
[2023-11-03, 18:07:23 UTC] {{glue.py:260}} INFO - No new log from the Glue
Job in /aws-glue/jobs/output
[2023-11-03, 18:07:23 UTC] {{glue.py:247}} WARNING - No new Glue driver logs
so far.
If this persists, check the CloudWatch dashboard at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home
[2023-11-03, 18:07:23 UTC] {{glue.py:260}} INFO - No new log from the Glue
Job in /aws-glue/jobs/error
[2023-11-03, 18:07:23 UTC] {{taskinstance.py:1784}} INFO - Rescheduling
task, marking task as UP_FOR_RESCHEDULE
[2023-11-03, 18:07:23 UTC] {{local_task_job_runner.py:225}} INFO - Task
exited with return code 0
[2023-11-03, 18:07:23 UTC] {{taskinstance.py:2653}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
[2023-11-03, 18:08:28 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=non-requeueable deps ti=<TaskInstance:
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00
[queued]>
[2023-11-03, 18:08:28 UTC] {{taskinstance.py:1103}} INFO - Dependencies all
met for dep_context=requeueable deps ti=<TaskInstance:
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00
[queued]>
[2023-11-03, 18:08:28 UTC] {{taskinstance.py:1308}} INFO - Starting attempt
1 of 1
[2023-11-03, 18:08:28 UTC] {{taskinstance.py:1327}} INFO - Executing
<Task(GlueJobSensor): wait_on_glue_job> on 2023-11-03 18:06:14.935010+00:00
[2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:57}} INFO - Started
process 393 to run task
[2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:84}} INFO - Running:
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job',
'manual__2023-11-03T18:06:14.935010+00:00', '--job-id', '2447', '--raw',
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmp5l5ilat_']
[2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:85}} INFO - Job 2447:
Subtask wait_on_glue_job
[2023-11-03, 18:08:28 UTC] {{task_command.py:410}} INFO - Running
<TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-11-03T18:06:14.935010+00:00 [running]> on host
ip-10-0-139-229.ec2.internal
[2023-11-03, 18:08:28 UTC] {{taskinstance.py:1545}} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sqlify_executor'
AIRFLOW_CTX_TASK_ID='wait_on_glue_job'
AIRFLOW_CTX_EXECUTION_DATE='2023-11-03T18:06:14.935010+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-03T18:06:14.935010+00:00'
[2023-11-03, 18:08:28 UTC] {{glue.py:71}} INFO - Poking for job run status
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID
jr_90de7aa69fd8dac7db96b6839b2d4142e9aa89c20456785d494ed15a68da992f
[2023-11-03, 18:08:29 UTC] {{base.py:73}} INFO - Using connection ID
'aws_default' for task execution.
[2023-11-03, 18:08:29 UTC] {{glue.py:247}} WARNING - No new Glue driver logs
so far.
If this persists, check the CloudWatch dashboard at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home
[2023-11-03, 18:08:29 UTC] {{glue.py:260}} INFO - No new log from the Glue
Job in /aws-glue/jobs/output
```
### Logs from apache-airflow-providers-amazon 7.4.1
```
*** Reading remote log from Cloudwatch log_group:
airflow-rrs-gamma-mwaa-Task log_stream:
dag_id=sqlify_executor/run_id=manual__2023-10-31T16_45_52.387595+00_00/task_id=wait_on_glue_job/attempt=1.log.
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1279}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1280}} INFO - Starting attempt
1 of 1
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1281}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:47:11 UTC] {{taskinstance.py:1300}} INFO - Executing
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
[2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:55}} INFO - Started
process 18088 to run task
[2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:82}} INFO - Running:
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job',
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11214', '--raw',
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpl64hl41l']
[2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:83}} INFO - Job 11214:
Subtask wait_on_glue_job
[2023-10-31, 16:47:11 UTC] {{task_command.py:388}} INFO - Running
<TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host
ip-10-0-157-23.ec2.internal
[2023-10-31, 16:47:12 UTC] {{taskinstance.py:1507}} INFO - Exporting the
following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=sqlify_executor
AIRFLOW_CTX_TASK_ID=wait_on_glue_job
AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
[2023-10-31, 16:47:12 UTC] {{glue.py:70}} INFO - Poking for job run status
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
[2023-10-31, 16:47:12 UTC] {{base.py:73}} INFO - Using connection ID
'aws_default' for task execution.
[2023-10-31, 16:47:12 UTC] {{glue.py:228}} WARNING - No new Glue driver logs
found. This might be because there are no new logs, or might be an error.
If the error persists, check the CloudWatch dashboard at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home
[2023-10-31, 16:47:12 UTC] {{taskinstance.py:1732}} INFO - Rescheduling
task, marking task as UP_FOR_RESCHEDULE
[2023-10-31, 16:47:12 UTC] {{local_task_job.py:208}} INFO - Task exited with
return code 0
[2023-10-31, 16:47:12 UTC] {{taskinstance.py:2578}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1279}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1280}} INFO - Starting attempt
1 of 1
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1281}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:48:16 UTC] {{taskinstance.py:1300}} INFO - Executing
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
[2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:55}} INFO - Started
process 18104 to run task
[2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:82}} INFO - Running:
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job',
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11215', '--raw',
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpl_akvt7z']
[2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:83}} INFO - Job 11215:
Subtask wait_on_glue_job
[2023-10-31, 16:48:17 UTC] {{task_command.py:388}} INFO - Running
<TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host
ip-10-0-157-23.ec2.internal
[2023-10-31, 16:48:17 UTC] {{taskinstance.py:1507}} INFO - Exporting the
following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=sqlify_executor
AIRFLOW_CTX_TASK_ID=wait_on_glue_job
AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
[2023-10-31, 16:48:17 UTC] {{glue.py:70}} INFO - Poking for job run status
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
[2023-10-31, 16:48:17 UTC] {{base.py:73}} INFO - Using connection ID
'aws_default' for task execution.
[2023-10-31, 16:48:17 UTC] {{taskinstance.py:1732}} INFO - Rescheduling
task, marking task as UP_FOR_RESCHEDULE
[2023-10-31, 16:48:17 UTC] {{local_task_job.py:208}} INFO - Task exited with
return code 0
[2023-10-31, 16:48:17 UTC] {{taskinstance.py:2578}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1083}} INFO - Dependencies all
met for <TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1279}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1280}} INFO - Starting attempt
1 of 1
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1281}} INFO -
--------------------------------------------------------------------------------
[2023-10-31, 16:49:20 UTC] {{taskinstance.py:1300}} INFO - Executing
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
[2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:55}} INFO - Started
process 17125 to run task
[2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:82}} INFO - Running:
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job',
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11216', '--raw',
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpjfbzqnjm']
[2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:83}} INFO - Job 11216:
Subtask wait_on_glue_job
[2023-10-31, 16:49:21 UTC] {{task_command.py:388}} INFO - Running
<TaskInstance: sqlify_executor.wait_on_glue_job
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host
ip-10-0-202-96.ec2.internal
[2023-10-31, 16:49:21 UTC] {{taskinstance.py:1507}} INFO - Exporting the
following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=sqlify_executor
AIRFLOW_CTX_TASK_ID=wait_on_glue_job
AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
[2023-10-31, 16:49:21 UTC] {{glue.py:70}} INFO - Poking for job run status
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
[2023-10-31, 16:49:21 UTC] {{base.py:73}} INFO - Using connection ID
'aws_default' for task execution.
[2023-10-31, 16:49:21 UTC] {{glue.py:81}} INFO - ('Exiting Job %s Run State:
%s', 'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465',
'FAILED')
[2023-10-31, 16:49:21 UTC] {{taskinstance.py:1768}} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/sensors/base.py",
line 199, in execute
poke_return = self.poke(context)
File
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/sensors/glue.py",
line 82, in poke
raise AirflowException(job_error_message)
airflow.exceptions.AirflowException: ('Exiting Job %s Run State: %s',
'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465', 'FAILED')
[2023-10-31, 16:49:21 UTC] {{taskinstance.py:1318}} INFO - Marking task as
FAILED. dag_id=sqlify_executor, task_id=wait_on_glue_job,
execution_date=20231031T164552, start_date=20231031T164920,
end_date=20231031T164921
[2023-10-31, 16:49:21 UTC] {{standard_task_runner.py:100}} ERROR - Failed to
execute job 11216 for task wait_on_glue_job (('Exiting Job %s Run State: %s',
'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465',
'FAILED'); 17125)
[2023-10-31, 16:49:22 UTC] {{local_task_job.py:208}} INFO - Task exited with
return code 1
[2023-10-31, 16:49:22 UTC] {{taskinstance.py:2578}} INFO - 0 downstream
tasks scheduled from follow-on schedule check
```
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]