wilsonhooi86 opened a new issue, #62353:
URL: https://github.com/apache/airflow/issues/62353

   ### Apache Airflow version
   
   2.11.X
   
   ### If "Other Airflow 3 version" selected, which one?
   
   MWAA 2.11.0
   
   ### What happened?
   
   Hi,
   
   I tried to run the GlueJobOperator with `resume_glue_job_on_retry=True` but 
somehow during retry when the task failed, it creates a new glue job ID while 
the existing glue job ID is still running.
   
   Sample of the GlueJobOperator:
   
   ```py
    drug_program = GlueJobOperator(
               task_id="drug_program",
               job_name="gtm-core-dl-dev-euc1-glue-job-glo_informa_prd",
               verbose=False,
               script_args={
                   "--name": var_glue_job_name,
                   "--db_name": var_database_name,
                   "--tbl_name": "drug_program",
               },
               stop_job_run_on_kill=False,
               deferrable=False,
               pool=var_glue_pool,
               botocore_config=botocore_overide_config,
               resume_glue_job_on_retry=True,
           )
   ```
   
   1st Run:
   Create a new glue job ID as per screenshot
   <img width="1956" height="825" alt="Image" 
src="https://github.com/user-attachments/assets/bfe92130-dd26-4141-a9b4-ca16d17f2718";
 />
   
   
   2nd Run (1st Retry):
   
   Job failed due to internal errors, during retry, it didn't find back the 
existing glue job ID but created 1 new glue job ID. End results, 2 same glue 
jobs are running together.
   
   <img width="1756" height="756" alt="Image" 
src="https://github.com/user-attachments/assets/04466625-4377-48a4-b4d4-0f58367bdc85";
 />
   
   
   ### What you think should happen instead?
   
   During task retry, it should check the existing glue job ID, if found, it 
should not create a new glue job ID and resume run status in Airflow Task
   
   ### How to reproduce
   
   1. Install the apache-airflow-providers-amazon==9.22.0rc1 in MWAA Airflow 
2.11 environment
   2. Create GlueJobOperator Dag
   
   Sample of the GlueJobOperator:
   
   ```py
    drug_program = GlueJobOperator(
               task_id="drug_program",
               job_name="gtm-core-dl-dev-euc1-glue-job-glo_informa_prd",
               verbose=False,
               script_args={
                   "--name": var_glue_job_name,
                   "--db_name": var_database_name,
                   "--tbl_name": "drug_program",
               },
               stop_job_run_on_kill=False,
               deferrable=False,
               pool=var_glue_pool,
               botocore_config=botocore_overide_config,
               resume_glue_job_on_retry=True,
           )
   ```
   
   3. Run the glue job task.
   4. During 1st run, it failed (to test, produce by marking the task as failed)
   5. Task went to for retry but somehow it creates a new glue job ID
   
   ### Operating System
   
   Amazon Linux 2023
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==9.22.0rc1
   
   ### Deployment
   
   Amazon (AWS) MWAA
   
   ### Deployment details
   
   MWAA 2.11.0 deployment with requirements to install 
`apache-airflow-providers-amazon==9.22.0rc1`
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to