codeant-ai-for-open-source[bot] commented on code in PR #39301:
URL: https://github.com/apache/superset/pull/39301#discussion_r3408325824
##########
superset/commands/dataset/importers/v1/utils.py:
##########
@@ -88,12 +117,33 @@ def get_dtype(df: pd.DataFrame, dataset: SqlaTable) ->
dict[str, VisitableType]:
def validate_data_uri(data_uri: str) -> None:
"""
- Validate that the data URI is configured on DATASET_IMPORT_ALLOWED_URLS
- has a valid URL.
+ Validate that the data URI is permitted for dataset import.
+
+ Local ``file://`` URIs are allowed only when the path is confined to the
+ bundled examples folder. All other URIs must match a pattern in
+ ``DATASET_IMPORT_ALLOWED_DATA_URLS`` *and* resolve to a publicly-routable
host.
- :param data_uri:
- :return:
+ :param data_uri: the URI to validate
+ :raises DatasetForbiddenDataURI: if the URI is not permitted
"""
+ if data_uri.startswith("file://"):
Review Comment:
**Suggestion:** The file-scheme gate is case-sensitive, so mixed-case
variants like `FiLe://...` bypass the `file://` sandbox check. When
`DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS` is enabled and the allowlist regex
matches broadly, this can allow arbitrary local file reads outside the examples
directory. Parse the URI scheme via `urlparse(data_uri).scheme.lower()` and
apply the examples-folder restriction to any `file` scheme regardless of input
casing. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Dataset import can read arbitrary local files via redirects.
- ❌ Alerts/reports using imported datasets may expose host files.
- ⚠️ Air-gapped deployments enabling internal URLs become exploitable.
- ⚠️ Validation logic contradicts file-URI safety comment in config.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In configuration, enable internal data URLs by setting
`DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS = True` in
`superset/config.py:10-14` and keep
the default broad allowlist `DATASET_IMPORT_ALLOWED_DATA_URLS = [r".*"]` as
defined in
`superset/config.py:1-8`.
2. Trigger a v1 dataset import that carries a `data` URL, e.g. via the
dashboard/dataset
import flow where `import_dataset(config, overwrite=overwrite)` is invoked in
`superset/commands/dataset/importers/v1/__init__.py:10-18`;
`import_dataset()` then
assigns `data_uri = config.get("data")` and calls `load_data(data_uri,
dataset,
dataset.database)` when `data_uri` is present (see
`superset/commands/dataset/importers/v1/utils.py:248-286`).
3. In `load_data()`
(`superset/commands/dataset/importers/v1/utils.py:327-343`), an
HTTP(S) source is opened with an `urllib` opener that installs
`_ValidatingRedirectHandler` (`utils.py:51-72`); when the remote server
responds with a
redirect whose `Location` is a mixed-case file URI such as
`FiLe:///etc/passwd`,
`_ValidatingRedirectHandler.redirect_request()` builds `absolute_url` and
calls
`validate_data_uri(absolute_url)` at `utils.py:70-71`.
4. Inside `validate_data_uri()` (`utils.py:118-164`), the mixed-case
`FiLe://` URI does
not satisfy the case-sensitive `if data_uri.startswith("file://"):` check at
line 129, so
the examples-folder restriction and `DatasetForbiddenDataURI` checks for
file URIs are
skipped; the URI instead matches the broad
`DATASET_IMPORT_ALLOWED_DATA_URLS` regex at
`utils.py:147-156`, and because `DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS` is
`True` at
`utils.py:157-163`, no hostname validation is applied to the file URI,
causing the
redirect target `FiLe:///etc/passwd` to be treated as allowed and
subsequently opened by
`opener.open(data_uri)` in `load_data()` (line 342), enabling reads of
arbitrary local
files outside the examples directory.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=81358197fbcd4b63ae3d0031c65a8f71&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=81358197fbcd4b63ae3d0031c65a8f71&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/dataset/importers/v1/utils.py
**Line:** 129:129
**Comment:**
*Security: The file-scheme gate is case-sensitive, so mixed-case
variants like `FiLe://...` bypass the `file://` sandbox check. When
`DATASET_IMPORT_ALLOW_INTERNAL_DATA_URLS` is enabled and the allowlist regex
matches broadly, this can allow arbitrary local file reads outside the examples
directory. Parse the URI scheme via `urlparse(data_uri).scheme.lower()` and
apply the examples-folder restriction to any `file` scheme regardless of input
casing.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=2133958eaf2579ada421bd7e5bda4ea72e024c7c0745ef7865b1fdbed10b99da&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=2133958eaf2579ada421bd7e5bda4ea72e024c7c0745ef7865b1fdbed10b99da&reaction=dislike'>👎</a>
##########
superset/reports/notifications/webhook.py:
##########
@@ -121,9 +151,17 @@ def send(self) -> None:
else:
data[key] = value
- response = requests.post(wh_url, data=data, files=files,
timeout=60)
+ response = requests.post(
+ wh_url,
+ data=data,
+ files=files,
+ timeout=60,
+ allow_redirects=False,
+ )
Review Comment:
**Suggestion:** Redirects are now disabled, but the response handling still
treats 3xx statuses as success because only `>=400` is considered failure. This
can silently drop notifications when webhook endpoints return 301/302/307,
since no request is sent to the final target while `send()` logs success.
Either add explicit 3xx failure handling (and validate `Location` safely) or
restore safe redirect-follow behavior with host revalidation. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Webhook-based alerts can be silently dropped on HTTP 3xx.
- ⚠️ Report scheduler logs success for undelivered webhook notifications.
- ⚠️ Operators cannot reliably detect misconfigured webhook endpoints.
- ⚠️ Backoff/retry logic never triggers for persistent 3xx responses.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure and enable alerts/reports webhooks so that
`feature_flag_manager.is_feature_enabled("ALERT_REPORT_WEBHOOK")` returns
True (checked in
`WebhookNotification.send()` at
`superset/reports/notifications/webhook.py:135-139`), and
create a report schedule with a webhook recipient (type
`ReportRecipientType.WEBHOOK`)
targeting an endpoint that responds with HTTP 302 or 307.
2. When the report executes, `send_notification_to_recipients()` in
`superset/commands/report/execute.py:1-23` iterates over `recipients`, calls
`create_notification(recipient, notification_content)` from
`superset/reports/notifications/__init__.py:25-34`, and for a webhook
recipient
instantiates `WebhookNotification` and invokes `notification.send()` at
`execute.py:22-23`.
3. Inside `WebhookNotification.send()`
(`superset/reports/notifications/webhook.py:134-179`), after
`_validate_webhook_url()`
passes, the method issues an HTTP request using `requests.post(wh_url,
data=data,
files=files, timeout=60, allow_redirects=False)` for payloads with files
(lines 154-160)
or `requests.post(wh_url, json=payload, timeout=60, allow_redirects=False)`
otherwise
(lines 161-164); if the endpoint returns a redirect (e.g. 302), `requests`
returns a
response with `status_code=302` and does not follow the redirect.
4. The subsequent response handling at `webhook.py:166-179` logs `"Webhook
sent to %s,
status code: %s"` regardless of 3xx, then raises only on
`response.status_code >= 500 or
== 429` (transient) and `response.status_code >= 400` (parameter error);
because 3xx codes
are `< 400`, no exception is thrown, `send()` returns successfully to
`send_notification_to_recipients()`, and the scheduler treats the
notification as
delivered even though no request was sent to the ultimate target URL and the
webhook may
never be processed.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=96b0f3b71bd1488c9c81b9c98e577d0a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=96b0f3b71bd1488c9c81b9c98e577d0a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/reports/notifications/webhook.py
**Line:** 154:160
**Comment:**
*Logic Error: Redirects are now disabled, but the response handling
still treats 3xx statuses as success because only `>=400` is considered
failure. This can silently drop notifications when webhook endpoints return
301/302/307, since no request is sent to the final target while `send()` logs
success. Either add explicit 3xx failure handling (and validate `Location`
safely) or restore safe redirect-follow behavior with host revalidation.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=9ff1953c37e85e73416b9d8e96df89df115b2b1f8d465d63a1275f5980614e1a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39301&comment_hash=9ff1953c37e85e73416b9d8e96df89df115b2b1f8d465d63a1275f5980614e1a&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]