sadpandajoe opened a new pull request, #41177:
URL: https://github.com/apache/superset/pull/41177

   ### SUMMARY
   
   Three independent, low-risk reliability fixes in the alert/report subsystem. 
No DB migration, no API change, no UI change.
   
   1. **Webhook retries could stall workers.** `WebhookNotification.send` 
retried via `backoff` with no wall-clock bound, so a hanging or 
persistently-failing target could tie up a worker for minutes per bad URL (up 
to ~5 socket waits at `timeout=60` plus retry sleeps), starving sequential 
report dispatch. Added `max_time=120` to the decorator. Retry counts 
(`factor`/`base`/`max_tries`) are unchanged, so legitimately-transient 5xx 
targets are still retried; `max_time` only caps total wall-clock and is checked 
between attempts, so the final in-flight request still gets its full timeout.
   
   2. **Opaque failure when the executor user is missing.** If the configured 
executor cannot be resolved (`security_manager.find_user` returns `None`), 
report execution previously failed later with an unclear `NoneType` error. The 
content-generation paths (`_get_screenshots` / `_get_csv_data` / 
`_get_embedded_data`) now raise a dedicated 
`ReportScheduleExecutorNotFoundError`. The guard sits at the content sites so 
it raises inside the state machine's error envelope — the `ERROR` execution-log 
row and the owner error notification are still produced. The `run()` boundary 
continues to delegate to the state machine (a missing user is tolerated there, 
matching prior behavior), so operator visibility is unchanged.
   
   3. **Slack v2 migration left recipients half-migrated on failure.** 
`update_report_schedule_slack_v2` reverted only the loop variable on error 
(leaving earlier-mutated recipients changed) and raised `UnboundLocalError` 
when the failure occurred before the loop bound a recipient. It now snapshots 
and reverts every recipient it mutated, and no longer crashes on a pre-loop 
failure.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A — backend-only change.
   
   ### TESTING INSTRUCTIONS
   
   Unit tests added/updated:
   
   - `tests/unit_tests/reports/notifications/webhook_tests.py`
   - `tests/unit_tests/commands/report/execute_test.py`
   
   ```
   pytest tests/unit_tests/reports/notifications/webhook_tests.py 
tests/unit_tests/commands/report/execute_test.py
   ```
   
   All pass. Each fix has at least one test that fails when the fix is reverted.
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to