rusackas opened a new pull request, #41250:
URL: https://github.com/apache/superset/pull/41250

   ### SUMMARY
   
   Alert/report execution makes several network calls with **no socket 
timeout**, so the underlying Python socket blocks *indefinitely* if the remote 
endpoint becomes unreachable. When that happens mid-execution, the report 
schedule is left in the `WORKING` state forever — every subsequent scheduler 
tick then raises `ReportSchedulePreviousWorkingError` (*"Report Schedule is 
still working, refusing to re-compute"*), and manually resetting the state via 
SQL only causes the next run to wedge the same way.
   
   This matches the symptoms in #40047 exactly: reports stuck in *"sending"* 
across **all** formats (CSV/PNG/PDF), affecting both new and existing reports, 
with **no logs after the run starts**, appearing after an environment change 
(e.g. SMTP host or `WEBDRIVER_BASEURL` becoming unreachable) without any image 
rebuild. Because the hang is in a blocking C-level socket read, Celery's 
`soft_time_limit` often can't interrupt it cleanly, and the `working_timeout` 
sweep only fires on a *later* tick — so the schedule stays wedged.
   
   Three previously-unbounded calls are now bounded by configurable timeouts:
   
   | Call | File | New config (default) |
   |------|------|----------------------|
   | `smtplib.SMTP` / `SMTP_SSL` email send | `superset/utils/core.py` | 
`SMTP_TIMEOUT` (30s) |
   | `urllib` chart-data fetch for CSV/dataframe attachments | 
`superset/utils/csv.py` | `ALERT_REPORTS_CSV_REQUEST_TIMEOUT` (60s) |
   | Selenium `driver.get()` navigation | `superset/utils/webdriver.py` | 
`SCREENSHOT_PAGE_LOAD_WAIT` (120s) via `set_page_load_timeout` |
   
   With a finite timeout the failing call now **raises** instead of hanging; 
the report state machine transitions the schedule to `ERROR`, the failure is 
surfaced/retried, and the worker is freed. The SMTP/CSV timeouts fall back to 
their defaults for custom configs that predate the new keys. All defaults can 
be set to `None` to restore the old unbounded behavior.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   N/A (backend reliability fix).
   
   ### TESTING INSTRUCTIONS
   
   - Unit: `pytest tests/unit_tests/utils/csv_tests.py 
tests/unit_tests/utils/webdriver_test.py`
   - Integration: `pytest tests/integration_tests/email_tests.py -k send_mime`
   - Manual: point `SMTP_HOST` (or `WEBDRIVER_BASEURL`) at an unroutable 
address (e.g. a blackhole IP) and trigger a report. Before this change the 
worker hangs and the schedule sticks in `WORKING`; after, the call times out, 
the report moves to `ERROR`, and the error is logged.
   
   New tests cover: the SMTP timeout being passed (and defaulting when the key 
is absent), the CSV fetch forwarding the timeout to `opener.open`, and the 
Selenium driver applying / skipping `set_page_load_timeout`.
   
   ### ADDITIONAL INFORMATION
   
   - [x] Has associated issue: Fixes #40047
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to