anas-ismail opened a new issue, #35431:
URL: https://github.com/apache/airflow/issues/35431

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   With `apache-airflow-providers-amazon` **8.2.0** and **8.10.0**, 
**GlueJobSensor** with `verbose` option set to **True** gets stuck in the 
`running` state until it times out even though underlying Glue jobs (Glue 3.0) 
are reaching the terminal state (FAILED or SUCCEEDED). 
   
   It works as expected when `verbose` option is set to **False** for 
`apache-airflow-providers-amazon` **8.2.0** and **8.10.0**.
   
   Also, it works as expected even when `verbose` option is set to **True** 
when I pin **7.4.1** version of `apache-airflow-providers-amazon`.
   
   ### What you think should happen instead
   
   1. **GlueJobSensor** should FAIL or SUCCEED when underlying Glue Job reaches 
its terminal state with `verbose` option set to **True** .
   2. **GlueJobSensor** should log Glue Job logs with `verbose` option set to 
**True** 
   
   ### How to reproduce
   
   1. Create a DAG with Glue Job Operator and Glue Job Sensor running in 
sequence.
   2. Glue Job Sensor should have **verbose** set to **True**.
   
   **DAG example:**
   ```
       run_glue_job = GlueJobOperator(
           task_id='run_glue_job',
           job_name="{{ ti.xcom_pull(task_ids='get_glue_job_arguments', 
key='jobname') }}",
           wait_for_completion=False,
           script_args="{{ ti.xcom_pull(task_ids='get_glue_job_arguments', 
key='script_args') }}"
       )
   
   
       wait_on_glue_job = GlueJobSensor(
           task_id='wait_on_glue_job',
           job_name="{{ dag_run.conf.jobname }}",
           run_id="{{ ti.xcom_pull(task_ids='run_glue_job', key='return_value') 
}}",
           mode='reschedule',
           timeout=DEFAULT_GLUE_WAIT_TIMEOUT,
           verbose=True
       )
   
       run_glue_job >> wait_on_glue_job
   ```
   
   ### Operating System
   
   N/A
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-amazon` Versions with the problem:  **8.2.0** and 
**8.10.0**
   `apache-airflow-providers-amazon` Version working fine: **7.4.1**
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   MWAA version 2.6.3
   
   ### Anything else
   
   It happens every time.
   I am attaching the Task logs from both problematic and non-problematic 
provider versions:
   
   ### Logs from apache-airflow-providers-amazon 8.2.0
   
   ```
   *** Reading remote log from Cloudwatch log_group: 
airflow-rrs-alpha-mwaa-Task log_stream: 
dag_id=sqlify_executor/run_id=manual__2023-11-03T18_06_14.935010+00_00/task_id=wait_on_glue_job/attempt=1.log.
   [2023-11-03, 18:07:22 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: 
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00 
[queued]>
   [2023-11-03, 18:07:22 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: 
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00 
[queued]>
   [2023-11-03, 18:07:22 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 
1 of 1
   [2023-11-03, 18:07:22 UTC] {{taskinstance.py:1327}} INFO - Executing 
<Task(GlueJobSensor): wait_on_glue_job> on 2023-11-03 18:06:14.935010+00:00
   [2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:57}} INFO - Started 
process 387 to run task
   [2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:84}} INFO - Running: 
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job', 
'manual__2023-11-03T18:06:14.935010+00:00', '--job-id', '2446', '--raw', 
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpvjto6uxf']
   [2023-11-03, 18:07:22 UTC] {{standard_task_runner.py:85}} INFO - Job 2446: 
Subtask wait_on_glue_job
   [2023-11-03, 18:07:22 UTC] {{task_command.py:410}} INFO - Running 
<TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-11-03T18:06:14.935010+00:00 [running]> on host 
ip-10-0-139-229.ec2.internal
   [2023-11-03, 18:07:22 UTC] {{taskinstance.py:1545}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sqlify_executor' 
AIRFLOW_CTX_TASK_ID='wait_on_glue_job' 
AIRFLOW_CTX_EXECUTION_DATE='2023-11-03T18:06:14.935010+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-03T18:06:14.935010+00:00'
   [2023-11-03, 18:07:22 UTC] {{glue.py:71}} INFO - Poking for job run status 
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID 
jr_90de7aa69fd8dac7db96b6839b2d4142e9aa89c20456785d494ed15a68da992f
   [2023-11-03, 18:07:22 UTC] {{base.py:73}} INFO - Using connection ID 
'aws_default' for task execution.
   [2023-11-03, 18:07:23 UTC] {{glue.py:247}} WARNING - No new Glue driver logs 
so far.
   If this persists, check the CloudWatch dashboard at: 
https://us-east-1.console.aws.amazon.com/cloudwatch/home
   [2023-11-03, 18:07:23 UTC] {{glue.py:260}} INFO - No new log from the Glue 
Job in /aws-glue/jobs/output
   [2023-11-03, 18:07:23 UTC] {{glue.py:247}} WARNING - No new Glue driver logs 
so far.
   If this persists, check the CloudWatch dashboard at: 
https://us-east-1.console.aws.amazon.com/cloudwatch/home
   [2023-11-03, 18:07:23 UTC] {{glue.py:260}} INFO - No new log from the Glue 
Job in /aws-glue/jobs/error
   [2023-11-03, 18:07:23 UTC] {{taskinstance.py:1784}} INFO - Rescheduling 
task, marking task as UP_FOR_RESCHEDULE
   [2023-11-03, 18:07:23 UTC] {{local_task_job_runner.py:225}} INFO - Task 
exited with return code 0
   [2023-11-03, 18:07:23 UTC] {{taskinstance.py:2653}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   [2023-11-03, 18:08:28 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=<TaskInstance: 
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00 
[queued]>
   [2023-11-03, 18:08:28 UTC] {{taskinstance.py:1103}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=<TaskInstance: 
sqlify_executor.wait_on_glue_job manual__2023-11-03T18:06:14.935010+00:00 
[queued]>
   [2023-11-03, 18:08:28 UTC] {{taskinstance.py:1308}} INFO - Starting attempt 
1 of 1
   [2023-11-03, 18:08:28 UTC] {{taskinstance.py:1327}} INFO - Executing 
<Task(GlueJobSensor): wait_on_glue_job> on 2023-11-03 18:06:14.935010+00:00
   [2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:57}} INFO - Started 
process 393 to run task
   [2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:84}} INFO - Running: 
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job', 
'manual__2023-11-03T18:06:14.935010+00:00', '--job-id', '2447', '--raw', 
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmp5l5ilat_']
   [2023-11-03, 18:08:28 UTC] {{standard_task_runner.py:85}} INFO - Job 2447: 
Subtask wait_on_glue_job
   [2023-11-03, 18:08:28 UTC] {{task_command.py:410}} INFO - Running 
<TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-11-03T18:06:14.935010+00:00 [running]> on host 
ip-10-0-139-229.ec2.internal
   [2023-11-03, 18:08:28 UTC] {{taskinstance.py:1545}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='sqlify_executor' 
AIRFLOW_CTX_TASK_ID='wait_on_glue_job' 
AIRFLOW_CTX_EXECUTION_DATE='2023-11-03T18:06:14.935010+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-11-03T18:06:14.935010+00:00'
   [2023-11-03, 18:08:28 UTC] {{glue.py:71}} INFO - Poking for job run status 
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID 
jr_90de7aa69fd8dac7db96b6839b2d4142e9aa89c20456785d494ed15a68da992f
   [2023-11-03, 18:08:29 UTC] {{base.py:73}} INFO - Using connection ID 
'aws_default' for task execution.
   [2023-11-03, 18:08:29 UTC] {{glue.py:247}} WARNING - No new Glue driver logs 
so far.
   If this persists, check the CloudWatch dashboard at: 
https://us-east-1.console.aws.amazon.com/cloudwatch/home
   [2023-11-03, 18:08:29 UTC] {{glue.py:260}} INFO - No new log from the Glue 
Job in /aws-glue/jobs/output
   ```
   
   ### Logs from apache-airflow-providers-amazon 7.4.1
   ```
   *** Reading remote log from Cloudwatch log_group: 
airflow-rrs-gamma-mwaa-Task log_stream: 
dag_id=sqlify_executor/run_id=manual__2023-10-31T16_45_52.387595+00_00/task_id=wait_on_glue_job/attempt=1.log.
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1279}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1280}} INFO - Starting attempt 
1 of 1
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1281}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:47:11 UTC] {{taskinstance.py:1300}} INFO - Executing 
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
   [2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:55}} INFO - Started 
process 18088 to run task
   [2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:82}} INFO - Running: 
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job', 
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11214', '--raw', 
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpl64hl41l']
   [2023-10-31, 16:47:11 UTC] {{standard_task_runner.py:83}} INFO - Job 11214: 
Subtask wait_on_glue_job
   [2023-10-31, 16:47:11 UTC] {{task_command.py:388}} INFO - Running 
<TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host 
ip-10-0-157-23.ec2.internal
   [2023-10-31, 16:47:12 UTC] {{taskinstance.py:1507}} INFO - Exporting the 
following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=sqlify_executor
   AIRFLOW_CTX_TASK_ID=wait_on_glue_job
   AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
   [2023-10-31, 16:47:12 UTC] {{glue.py:70}} INFO - Poking for job run status 
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID 
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
   [2023-10-31, 16:47:12 UTC] {{base.py:73}} INFO - Using connection ID 
'aws_default' for task execution.
   [2023-10-31, 16:47:12 UTC] {{glue.py:228}} WARNING - No new Glue driver logs 
found. This might be because there are no new logs, or might be an error.
   If the error persists, check the CloudWatch dashboard at: 
https://us-east-1.console.aws.amazon.com/cloudwatch/home
   [2023-10-31, 16:47:12 UTC] {{taskinstance.py:1732}} INFO - Rescheduling 
task, marking task as UP_FOR_RESCHEDULE
   [2023-10-31, 16:47:12 UTC] {{local_task_job.py:208}} INFO - Task exited with 
return code 0
   [2023-10-31, 16:47:12 UTC] {{taskinstance.py:2578}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1279}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1280}} INFO - Starting attempt 
1 of 1
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1281}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:48:16 UTC] {{taskinstance.py:1300}} INFO - Executing 
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
   [2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:55}} INFO - Started 
process 18104 to run task
   [2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:82}} INFO - Running: 
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job', 
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11215', '--raw', 
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpl_akvt7z']
   [2023-10-31, 16:48:16 UTC] {{standard_task_runner.py:83}} INFO - Job 11215: 
Subtask wait_on_glue_job
   [2023-10-31, 16:48:17 UTC] {{task_command.py:388}} INFO - Running 
<TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host 
ip-10-0-157-23.ec2.internal
   [2023-10-31, 16:48:17 UTC] {{taskinstance.py:1507}} INFO - Exporting the 
following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=sqlify_executor
   AIRFLOW_CTX_TASK_ID=wait_on_glue_job
   AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
   [2023-10-31, 16:48:17 UTC] {{glue.py:70}} INFO - Poking for job run status 
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID 
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
   [2023-10-31, 16:48:17 UTC] {{base.py:73}} INFO - Using connection ID 
'aws_default' for task execution.
   [2023-10-31, 16:48:17 UTC] {{taskinstance.py:1732}} INFO - Rescheduling 
task, marking task as UP_FOR_RESCHEDULE
   [2023-10-31, 16:48:17 UTC] {{local_task_job.py:208}} INFO - Task exited with 
return code 0
   [2023-10-31, 16:48:17 UTC] {{taskinstance.py:2578}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1083}} INFO - Dependencies all 
met for <TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [queued]>
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1279}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1280}} INFO - Starting attempt 
1 of 1
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1281}} INFO - 
   
--------------------------------------------------------------------------------
   [2023-10-31, 16:49:20 UTC] {{taskinstance.py:1300}} INFO - Executing 
<Task(GlueJobSensor): wait_on_glue_job> on 2023-10-31 16:45:52.387595+00:00
   [2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:55}} INFO - Started 
process 17125 to run task
   [2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:82}} INFO - Running: 
['airflow', 'tasks', 'run', 'sqlify_executor', 'wait_on_glue_job', 
'manual__2023-10-31T16:45:52.387595+00:00', '--job-id', '11216', '--raw', 
'--subdir', 'DAGS_FOLDER/sqlify_executor.py', '--cfg-path', '/tmp/tmpjfbzqnjm']
   [2023-10-31, 16:49:20 UTC] {{standard_task_runner.py:83}} INFO - Job 11216: 
Subtask wait_on_glue_job
   [2023-10-31, 16:49:21 UTC] {{task_command.py:388}} INFO - Running 
<TaskInstance: sqlify_executor.wait_on_glue_job 
manual__2023-10-31T16:45:52.387595+00:00 [running]> on host 
ip-10-0-202-96.ec2.internal
   [2023-10-31, 16:49:21 UTC] {{taskinstance.py:1507}} INFO - Exporting the 
following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=sqlify_executor
   AIRFLOW_CTX_TASK_ID=wait_on_glue_job
   AIRFLOW_CTX_EXECUTION_DATE=2023-10-31T16:45:52.387595+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2023-10-31T16:45:52.387595+00:00
   [2023-10-31, 16:49:21 UTC] {{glue.py:70}} INFO - Poking for job run status 
:for Glue Job RRSSqlExecutorHeavyV3G1X and ID 
jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465
   [2023-10-31, 16:49:21 UTC] {{base.py:73}} INFO - Using connection ID 
'aws_default' for task execution.
   [2023-10-31, 16:49:21 UTC] {{glue.py:81}} INFO - ('Exiting Job %s Run State: 
%s', 'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465', 
'FAILED')
   [2023-10-31, 16:49:21 UTC] {{taskinstance.py:1768}} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/sensors/base.py",
 line 199, in execute
       poke_return = self.poke(context)
     File 
"/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/sensors/glue.py",
 line 82, in poke
       raise AirflowException(job_error_message)
   airflow.exceptions.AirflowException: ('Exiting Job %s Run State: %s', 
'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465', 'FAILED')
   [2023-10-31, 16:49:21 UTC] {{taskinstance.py:1318}} INFO - Marking task as 
FAILED. dag_id=sqlify_executor, task_id=wait_on_glue_job, 
execution_date=20231031T164552, start_date=20231031T164920, 
end_date=20231031T164921
   [2023-10-31, 16:49:21 UTC] {{standard_task_runner.py:100}} ERROR - Failed to 
execute job 11216 for task wait_on_glue_job (('Exiting Job %s Run State: %s', 
'jr_6726789ecc1a71a19427bc7d13ab3a06546fc684324a02d13aa8067f59b9b465', 
'FAILED'); 17125)
   [2023-10-31, 16:49:22 UTC] {{local_task_job.py:208}} INFO - Task exited with 
return code 1
   [2023-10-31, 16:49:22 UTC] {{taskinstance.py:2578}} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   ```
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to