henry3260 opened a new pull request, #59392:
URL: https://github.com/apache/airflow/pull/59392

   
   closes: #59075
   
   ## Description
   
   Add `resume_glue_job_on_retry` parameter to **GlueJobOperator** to prevent 
duplicate AWS Glue job runs during task retries.
   
   ### Problem
   
   When a GlueJobOperator task is retried after failure, the operator would 
always create a new AWS Glue job run, leading to:
   - Multiple concurrent job runs for the same task execution
   - Wasted resources and costs
   - Confusing job history and tracking
   
   ### Solution
   
   Introduce `resume_glue_job_on_retry` parameter that enables **idempotent 
retry behavior**:
   
   1. When enabled, the operator checks if a previous job run is still in 
progress (RUNNING, STARTING, or STOPPING states)
   2. If in progress, reuses the existing `job_run_id` instead of creating a 
new one
   3. If the previous job is finished (SUCCEEDED, FAILED, etc.), creates a new 
job run as normal
   4. Previous job state is tracked via XCom across retries
   
   ### Changes Made
   
   **GlueJobOperator (glue.py)**:
   - Added `resume_glue_job_on_retry: bool = False` parameter to `__init__`
   - Enhanced `execute()` method to check previous job state from XCom when 
enabled
   - Queries AWS Glue API (`get_job_run()`) to verify job state before deciding 
to create new run
   - Proper exception handling for graceful fallback if XCom or Glue API calls 
fail
   
   **Unit Tests (test_glue.py)**:
   - `test_check_previous_job_id_run_reuse_in_progress`: Verifies previous 
job_run_id is reused when job is RUNNING
   - `test_check_previous_job_id_run_new_on_finished`: Verifies new job is 
created when previous job is SUCCEEDED
   
   ### Backward Compatibility
   
   ✅ Fully backward compatible - parameter defaults to `False`, maintaining 
existing behavior by default.
   
   <!-- Please keep an empty line above the dashes. -->
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to