shivaam opened a new pull request, #63086:
URL: https://github.com/apache/airflow/pull/63086

   ### Why
   
   When using `GlueJobOperator(deferrable=True, verbose=True)`, CloudWatch logs 
are silently ignored. The `verbose` flag is stored in the trigger but never 
read — the inherited `AwsBaseWaiterTrigger.run()` only polls job status via a 
boto3 waiter and has no log-fetching logic.
   
   This means users who switch from `deferrable=False` to `deferrable=True` 
lose all verbose CloudWatch log output with no warning.
   
   closes: #56535
   
   ### What
   
   Added a `run()` override and `_forward_logs()` helper to 
`GlueJobCompleteTrigger` in `triggers/glue.py`:
   
   - When `verbose=False`: delegates to `super().run()` — zero behavior change.
   - When `verbose=True`: custom async poll loop that checks job state and 
streams logs from both `/output` and `/error` CloudWatch log groups using 
`get_log_events` with continuation tokens.
   - Log format matches the sync path (`GlueJobHook.print_job_logs`): 
tab-indented lines prefixed with `Glue Job Run <log_group> Logs:`, and `No new 
log from the Glue Job in <log_group>` when idle.
   
   ### How
   
   Follows the same pattern as the ECS `TaskDoneTrigger._forward_logs()` which 
already does async CloudWatch log tailing in this codebase. Uses 
`get_log_events` (async, token-based) instead of the sync path's 
`filter_log_events` (paginator-based), but produces identical user-facing 
output.
   
   ### Testing
   
   - 7 unit tests covering: success, failure, max attempts exceeded, 
ResourceNotFoundException, pagination, log format verification, and 
no-new-events case. All pass.
   - Manually tested with a real Glue Python Shell job (20 steps, 15s apart) 
running both sync and deferrable tasks side by side in Breeze.
   
   **Sync task output:**
   ```
   INFO - Glue Job Run /aws-glue/python-jobs/output Logs:
        Processing step 4/20...
   INFO - No new log from the Glue Job in /aws-glue/python-jobs/error
   INFO - Polling for AWS Glue Job test-verbose-logging-sync current run state 
with status RUNNING
   INFO - No new log from the Glue Job in /aws-glue/python-jobs/output
   INFO - Glue Job Run /aws-glue/python-jobs/output Logs:
        Processing step 5/20...
   ```
   
   **Deferrable task output (after fix):**
   ```
   INFO - Glue Job Run /aws-glue/python-jobs/output Logs:
        Processing step 4/20...
        Processing step 5/20...
   WARNING - No new Glue driver logs so far.
   INFO - Polling for AWS Glue Job test-verbose-logging-deferrable current run 
state: RUNNING
   INFO - Glue Job Run /aws-glue/python-jobs/output Logs:
        Processing step 6/20...
        Processing step 7/20...
   ```
   
   The deferrable path batches more steps per poll cycle (30s vs 6s polling 
interval) but the format is now consistent.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Kiro (Claude Opus 4.6)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to