codeant-ai-for-open-source[bot] commented on code in PR #41250:
URL: https://github.com/apache/superset/pull/41250#discussion_r3459935671
##########
superset/utils/core.py:
##########
@@ -948,9 +953,11 @@ def send_mime_email(
# root CA certificates
ssl_context = ssl.create_default_context() if smtp_ssl_server_auth else
None
smtp = (
- smtplib.SMTP_SSL(smtp_host, smtp_port, context=ssl_context)
+ smtplib.SMTP_SSL(
+ smtp_host, smtp_port, context=ssl_context, timeout=smtp_timeout
+ )
if smtp_ssl
- else smtplib.SMTP(smtp_host, smtp_port)
+ else smtplib.SMTP(smtp_host, smtp_port, timeout=smtp_timeout)
)
Review Comment:
**Suggestion:** The SMTP connection is not protected by a `try/finally` (or
context manager), so when the new timeout causes `starttls`, `login`, or
`sendmail` to raise, `quit()` is skipped and sockets can accumulate in the
worker process. Wrap the SMTP session lifecycle so cleanup always runs on error
paths. [missing cleanup]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Repeated SMTP timeouts leak sockets in Celery worker process.
- ⚠️ Alert and report email failures increase worker resource usage.
- ⚠️ Dashboard deactivation emails may leak SMTP connections.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure an EMAIL report recipient so that the scheduler uses the report
execution
flow in `superset/commands/report/execute.py:_send` (lines 740–799), which
iterates
`recipients` and calls `notification.send()` for each via
`create_notification(...)` from
`superset/reports/notifications/__init__.py:25-31`.
2. Ensure the recipient type is EMAIL so `create_notification` instantiates
`EmailNotification` from `superset/reports/notifications/email.py:80-86`;
its `send()`
method at lines 33-56 calls `send_email_smtp(...)` from
`superset/utils/core.py` with
`current_app.config` and `dryrun=False`.
3. Inside `superset/utils/core.py:send_email_smtp` (lines 829-879), the
function builds a
`MIMEMultipart` and calls `send_mime_email(smtp_mail_from, recipients, msg,
config,
dryrun=dryrun)` at line 5 of the earlier snippet, which enters
`send_mime_email` defined
at `superset/utils/core.py:927-49;` there the SMTP client is created at
lines 955-961
using `smtplib.SMTP_SSL(..., timeout=smtp_timeout)` or `smtplib.SMTP(...,
timeout=smtp_timeout)`, and only after `smtp.starttls()`, `smtp.login()`, and
`smtp.sendmail()` does it call `smtp.quit()` at line 49.
4. Point `SMTP_HOST`/`SMTP_PORT` in `current_app.config` to an SMTP endpoint
that accepts
TCP but hangs longer than `SMTP_TIMEOUT` during `starttls`, `login`, or
`sendmail`,
causing those calls in `send_mime_email` (lines 43-48) to raise (e.g.
`socket.timeout`);
because there is no `try/finally` or `with smtplib.SMTP(...)` context
manager anywhere in
the repo (confirmed by a `Grep` search for `with smtplib.SMTP` returning no
matches), the
exception unwinds out of `send_mime_email` before reaching `smtp.quit()`, so
the SMTP
socket is not explicitly closed and can accumulate across repeated failing
sends in the
long-lived worker process.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d355d61a579d412c84559a6b71a86511&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d355d61a579d412c84559a6b71a86511&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/core.py
**Line:** 955:961
**Comment:**
*Missing Cleanup: The SMTP connection is not protected by a
`try/finally` (or context manager), so when the new timeout causes `starttls`,
`login`, or `sendmail` to raise, `quit()` is skipped and sockets can accumulate
in the worker process. Wrap the SMTP session lifecycle so cleanup always runs
on error paths.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41250&comment_hash=3e1ec04d4520acc344ad3082ef1c5ceb852af8b894ed4b9cd4e4e8bdfef10f8f&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41250&comment_hash=3e1ec04d4520acc344ad3082ef1c5ceb852af8b894ed4b9cd4e4e8bdfef10f8f&reaction=dislike'>👎</a>
##########
superset/utils/webdriver.py:
##########
@@ -462,6 +462,11 @@ def driver(self) -> WebDriver:
if not self._driver:
raise RuntimeError("WebDriver creation failed")
self._driver.set_window_size(*self._window)
+ # Bound driver.get() so an unreachable page raises a
TimeoutException
+ # instead of blocking the worker (and the report schedule) forever.
+ page_load_wait = app.config["SCREENSHOT_PAGE_LOAD_WAIT"]
+ if page_load_wait is not None:
+ self._driver.set_page_load_timeout(page_load_wait)
Review Comment:
**Suggestion:** The new page-load timeout can now raise during driver
initialization, but there is no rollback of the already-assigned driver
instance. If an exception is raised here (or in the immediate auth step that
now uses this timeout), `_driver` stays cached in a partially
initialized/unauthenticated state, and later calls will reuse it instead of
recreating and re-authenticating. Wrap initialization/auth timeout setup in
failure cleanup that resets/destroys `_driver` before re-raising. [stale
reference]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Cache warmup can reuse unauthenticated Selenium drivers.
- ⚠️ Page-load timeout may remain unset after initialization failure.
- ⚠️ Subsequent warmup URLs may behave inconsistently or hang.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/tasks/cache.py:256-268` the cache warmup task constructs a
long-lived
`WebDriverSelenium` instance via `wd =
WebDriverSelenium(current_app.config["WEBDRIVER_TYPE"], user=user)` and
reuses it inside a
loop, only calling `wd.destroy()` once in the outer `finally` block after
all URLs are
processed.
2. For each URL, `wd.get_screenshot(url, "grid-container")` (defined in
`superset/utils/webdriver.py:687-745`) calls `driver = self.driver`, which
executes the
`driver` property initialization logic at
`superset/utils/webdriver.py:109-120`, including
`_create()`, `set_window_size`, the new `set_page_load_timeout`, and
`_auth(self._user)`.
3. If any of these initialization steps after `_create()` raise (for example,
`self._driver.set_page_load_timeout(page_load_wait)` at
`superset/utils/webdriver.py:118-120` or `_auth(self._user)` at
`superset/utils/webdriver.py:20-26` raising a `WebDriverException` or
another error), the
exception propagates back to `wd.get_screenshot`, is caught by the inner
`except
Exception` in `cache.py:259-266`, and the loop continues, but `_driver`
remains set to the
partially initialized instance because there is no cleanup in the `driver`
property.
4. On the next iteration of the warmup loop, `wd.get_screenshot()` calls
`self.driver`
again; since `self._driver` is now truthy, the property returns the cached
instance
without re-running `set_window_size`, `set_page_load_timeout`, or `_auth`,
so subsequent
screenshot calls reuse a possibly unauthenticated or misconfigured driver
until
`wd.destroy()` eventually runs at the end of the task, confirming the
suggestion that
initialization-time failures can leave a stale cached driver.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1ef71d12b033431fa6b52a1d139b0599&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1ef71d12b033431fa6b52a1d139b0599&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/webdriver.py
**Line:** 465:469
**Comment:**
*Stale Reference: The new page-load timeout can now raise during driver
initialization, but there is no rollback of the already-assigned driver
instance. If an exception is raised here (or in the immediate auth step that
now uses this timeout), `_driver` stays cached in a partially
initialized/unauthenticated state, and later calls will reuse it instead of
recreating and re-authenticating. Wrap initialization/auth timeout setup in
failure cleanup that resets/destroys `_driver` before re-raising.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41250&comment_hash=f8f2dedb4058160018271006198995a078de7fc16047676996d2f37051e9be4c&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41250&comment_hash=f8f2dedb4058160018271006198995a078de7fc16047676996d2f37051e9be4c&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]