sadpandajoe opened a new pull request, #41177: URL: https://github.com/apache/superset/pull/41177
### SUMMARY Three independent, low-risk reliability fixes in the alert/report subsystem. No DB migration, no API change, no UI change. 1. **Webhook retries could stall workers.** `WebhookNotification.send` retried via `backoff` with no wall-clock bound, so a hanging or persistently-failing target could tie up a worker for minutes per bad URL (up to ~5 socket waits at `timeout=60` plus retry sleeps), starving sequential report dispatch. Added `max_time=120` to the decorator. Retry counts (`factor`/`base`/`max_tries`) are unchanged, so legitimately-transient 5xx targets are still retried; `max_time` only caps total wall-clock and is checked between attempts, so the final in-flight request still gets its full timeout. 2. **Opaque failure when the executor user is missing.** If the configured executor cannot be resolved (`security_manager.find_user` returns `None`), report execution previously failed later with an unclear `NoneType` error. The content-generation paths (`_get_screenshots` / `_get_csv_data` / `_get_embedded_data`) now raise a dedicated `ReportScheduleExecutorNotFoundError`. The guard sits at the content sites so it raises inside the state machine's error envelope — the `ERROR` execution-log row and the owner error notification are still produced. The `run()` boundary continues to delegate to the state machine (a missing user is tolerated there, matching prior behavior), so operator visibility is unchanged. 3. **Slack v2 migration left recipients half-migrated on failure.** `update_report_schedule_slack_v2` reverted only the loop variable on error (leaving earlier-mutated recipients changed) and raised `UnboundLocalError` when the failure occurred before the loop bound a recipient. It now snapshots and reverts every recipient it mutated, and no longer crashes on a pre-loop failure. ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF N/A — backend-only change. ### TESTING INSTRUCTIONS Unit tests added/updated: - `tests/unit_tests/reports/notifications/webhook_tests.py` - `tests/unit_tests/commands/report/execute_test.py` ``` pytest tests/unit_tests/reports/notifications/webhook_tests.py tests/unit_tests/commands/report/execute_test.py ``` All pass. Each fix has at least one test that fails when the fix is reverted. ### ADDITIONAL INFORMATION - [ ] Has associated issue: - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351)) - [ ] Migration is atomic, supports rollback & is backwards-compatible - [ ] Confirm DB migration upgrade and downgrade tested - [ ] Runtime estimates and downtime expectations provided - [ ] Introduces new feature or API - [ ] Removes existing feature or API 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
