henry3260 commented on PR #59392:
URL: https://github.com/apache/airflow/pull/59392#issuecomment-3879077546

   Hi! @wilsonhooi86 , Yes, it will find back the same previous_glue_job_id and 
stop creating a new glue job, because when GlueJobOperator retries, it will 
only find its own glue_job_run_id for the specific task_id.
   
   > Good Day@henry3260 ,
   > 
   > Happy New Year and thank you so much for taking the initiative to add this 
feature. It will be helpful.
   > 
   > I would like to clarify a specific scenario regarding a Glue job named 
`glue_job_database_name_1`. This job is designed to handle a single schema but 
uses a `tbl_name ` argument to process different tables dynamically. The script 
logic adapts based on the table name passed during execution.
   > 
   > Assuming 1 dag, there are 3 GlueJobOperator calling the same glue job name 
`glue_job_database_name_1` running in parallel.
   > 
   > Assuming `task_id="table_1"` and `task_id="table_2"` are still running 
glue jobs. If `task_id="table_3"` suddenly failed due to some internal error 
and retry again, will it be able to find back the same `previous_glue_job_id` 
and stop creating a new glue job?
   > 
   > ```
   > table_1 = GlueJobOperator(
   >             task_id="table_1",
   >             job_name="glue_job_database_name_1",
   >             verbose=False,
   >             script_args={
   >                 "--tbl_name": "table_1",
   >             },
   >        resume_glue_job_on_retry=True,
   >             retry_limit=3,
   >         )
   >            
   > table_2 = GlueJobOperator(
   >             task_id="table_2",
   >             job_name="glue_job_database_name_1",
   >             verbose=False,
   >             script_args={
   >                 "--tbl_name": "table_2",
   >             },
   >        resume_glue_job_on_retry=True,
   >             retry_limit=3,
   >         )
   >            
   > table_3 = GlueJobOperator(
   >             task_id="table_3",
   >             job_name="glue_job_database_name_1",
   >             verbose=False,
   >             script_args={
   >                 "--tbl_name": "table_3",
   >             },
   >        resume_glue_job_on_retry=True,
   >             retry_limit=3,
   >         )
   > ```
   > 
   > Thanks and let me know if you need further clarification
   
   Hi! @wilsonhooi86 , Yes, it will find back the same previous_glue_job_id and 
stop creating a new glue job, because when GlueJobOperator retries, it will 
only find its own glue_job_run_id for the specific task_id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to