BharathPESU opened a new pull request, #57861:
URL: https://github.com/apache/airflow/pull/57861
Fixes a critical bug in
[CloudRunExecuteJobOperator](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
where jobs with
[deferrable=True](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
are incorrectly marked as successful when Cloud Run Jobs are canceled or have
failed tasks.
Problem
When using
[CloudRunExecuteJobOperator](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
with
[deferrable=True](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html),
the operator would mark tasks as successful even when:
Jobs were canceled - Not all tasks completed execution
Tasks failed - Some tasks within the job failed
Malformed trigger events - Missing required fields could cause KeyError or
produce unclear error messages
The root cause was that the trigger only checked
[operation.done](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
without validating the actual execution state (task completion counts), and
the operator didn't defensively validate the event payload.
Solution
This PR implements a three-layer defensive validation approach:
1. Trigger Layer
([cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
trigger)
Extracts execution details
([task_count](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html),
[succeeded_count](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html),
[failed_count](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html))
from the operation response using protobuf
[Unpack()](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
Validates
[Unpack()](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
return value and raises clear exceptions if deserialization fails
Includes operation name and job name in error messages for debugging
2. Operator Layer
([cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
operator)
Defensive event validation: Uses
[event.get()](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
to prevent KeyError on malformed events
Execution state validation:
Checks [succeeded_count + failed_count ==
task_count](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
to detect canceled jobs
Checks [failed_count >
0](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
to detect failed tasks
Validates presence of required execution fields
Improved error messages: Replaces None/empty error details with meaningful
defaults ("Unknown", "Unknown error")
3. Test Coverage
Added comprehensive test coverage for:
✅ Successful job execution with all tasks completed
✅ Canceled jobs (incomplete task execution)
✅ Jobs with failed tasks
✅ Malformed events missing status field
✅ Malformed events missing execution fields
✅ Operation failures with missing error details (None values)
✅ Protobuf Unpack() failure scenarios
Changes
Modified Files:
[cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
[cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
[test_cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
[test_cloud_run.py](vscode-file://vscode-app/usr/share/code/resources/app/out/vs/code/electron-browser/workbench/workbench.html)
Statistics:
2 files changed in source code
2 test files enhanced
236 insertions total across both commits
4 deletions
Testing
All changes include comprehensive unit test coverage. Tests verify:
Success path with valid execution details
Failure detection for canceled jobs
Failure detection for jobs with failed tasks
Error handling for malformed trigger events
Meaningful error messages for missing operation error details
Protobuf Unpack() failure handling
Example Error Messages
Before this fix:
After this fix:
Checklist
Bug fix (non-breaking change which fixes an issue)
New tests added to cover the changes
All tests pass locally
Follows code style guidelines
Includes appropriate error handling
Error messages are clear and actionable
Related Issues
Note: This fix prevents false positives in production environments where
canceled or partially completed Cloud Run Jobs would be incorrectly reported as
successful, potentially leading to data inconsistencies or missed error
conditions.
Feel free to modify this description based on any specific issue numbers or
additional context you want to include!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]