rusackas opened a new pull request, #41250: URL: https://github.com/apache/superset/pull/41250
### SUMMARY Alert/report execution makes several network calls with **no socket timeout**, so the underlying Python socket blocks *indefinitely* if the remote endpoint becomes unreachable. When that happens mid-execution, the report schedule is left in the `WORKING` state forever — every subsequent scheduler tick then raises `ReportSchedulePreviousWorkingError` (*"Report Schedule is still working, refusing to re-compute"*), and manually resetting the state via SQL only causes the next run to wedge the same way. This matches the symptoms in #40047 exactly: reports stuck in *"sending"* across **all** formats (CSV/PNG/PDF), affecting both new and existing reports, with **no logs after the run starts**, appearing after an environment change (e.g. SMTP host or `WEBDRIVER_BASEURL` becoming unreachable) without any image rebuild. Because the hang is in a blocking C-level socket read, Celery's `soft_time_limit` often can't interrupt it cleanly, and the `working_timeout` sweep only fires on a *later* tick — so the schedule stays wedged. Three previously-unbounded calls are now bounded by configurable timeouts: | Call | File | New config (default) | |------|------|----------------------| | `smtplib.SMTP` / `SMTP_SSL` email send | `superset/utils/core.py` | `SMTP_TIMEOUT` (30s) | | `urllib` chart-data fetch for CSV/dataframe attachments | `superset/utils/csv.py` | `ALERT_REPORTS_CSV_REQUEST_TIMEOUT` (60s) | | Selenium `driver.get()` navigation | `superset/utils/webdriver.py` | `SCREENSHOT_PAGE_LOAD_WAIT` (120s) via `set_page_load_timeout` | With a finite timeout the failing call now **raises** instead of hanging; the report state machine transitions the schedule to `ERROR`, the failure is surfaced/retried, and the worker is freed. The SMTP/CSV timeouts fall back to their defaults for custom configs that predate the new keys. All defaults can be set to `None` to restore the old unbounded behavior. ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF N/A (backend reliability fix). ### TESTING INSTRUCTIONS - Unit: `pytest tests/unit_tests/utils/csv_tests.py tests/unit_tests/utils/webdriver_test.py` - Integration: `pytest tests/integration_tests/email_tests.py -k send_mime` - Manual: point `SMTP_HOST` (or `WEBDRIVER_BASEURL`) at an unroutable address (e.g. a blackhole IP) and trigger a report. Before this change the worker hangs and the schedule sticks in `WORKING`; after, the call times out, the report moves to `ERROR`, and the error is logged. New tests cover: the SMTP timeout being passed (and defaulting when the key is absent), the CSV fetch forwarding the timeout to `opener.open`, and the Selenium driver applying / skipping `set_page_load_timeout`. ### ADDITIONAL INFORMATION - [x] Has associated issue: Fixes #40047 - [ ] Required feature flags: - [ ] Changes UI - [ ] Includes DB Migration - [ ] Introduces new feature or API - [ ] Removes existing feature or API 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
