codeant-ai-for-open-source[bot] commented on code in PR #41306:
URL: https://github.com/apache/superset/pull/41306#discussion_r3456371013
##########
superset/views/base_api.py:
##########
@@ -223,6 +223,27 @@ def incr_stats(self, action: str, func_name: str) -> None:
f"{self.__class__.__name__}.{func_name}.{action}"
)
+ def log_rejected_field_access(self, func_name: str, column_name: str) ->
None:
+ """Emit a security log event when a related/distinct field is rejected.
+
+ The allowlist check itself blocks the request; this records the attempt
+ in the structured log (alongside the existing statsd counter) so that
+ rejected field access is visible to security monitoring and forensics,
+ with the caller's identity, the endpoint, and the attempted value.
+ """
+ # Sanitize the user-supplied column name to a single, bounded token so
+ # it cannot inject newlines or forge extra log lines.
+ sanitized_column = "".join(
+ ch for ch in str(column_name) if ch.isprintable() and ch not in
"\r\n"
+ )[:200]
Review Comment:
**Suggestion:** The sanitization still allows spaces and punctuation like
`=` in `column_name`, so an attacker can send values such as `foo
endpoint=admin` and forge extra key/value tokens in the log line. This breaks
the intended "single token" guarantee and can corrupt security-log parsing.
Restrict the value to a safe token character set (for example alphanumeric plus
`_-.`) or encode it before logging. [security]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Related endpoint logs can contain attacker-crafted endpoint tokens.
- ⚠️ Distinct endpoint security logs may mislead incident investigations.
- ⚠️ `test_get_related_fail` asserts logging but not token safety.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start Superset with this PR code so that `BaseSupersetModelRestApi` in
`superset/views/base_api.py` is used by its subclasses (for example
`ChartRestApi` in
`superset/charts/api.py:11-27`, which includes `RouteMethod.RELATED` and
therefore exposes
the `/related/<column_name>` endpoint).
2. As an authenticated user (mirroring
`tests/integration_tests/base_api_tests.BaseApiOwnersTestCaseMixin.test_get_related_fail`
at `tests/integration_tests/base_api_tests.py:10-18`), issue a GET request
to a related
endpoint such as `GET /api/v1/chart/related/foo endpoint=admin` where
`column_name` in the
URL path is the literal string `foo endpoint=admin` (including the space and
`=`).
3. The request is handled by `BaseSupersetModelRestApi.related` in
`superset/views/base_api.py:11-25`, which checks `if column_name not in
self.allowed_rel_fields:` at `superset/views/base_api.py:22`. Because `"foo
endpoint=admin"` is not in the allowlist, the method calls
`self.log_rejected_field_access(self.related.__name__, column_name)` at
`superset/views/base_api.py:24`.
4. Inside `log_rejected_field_access` at `superset/views/base_api.py:7-25`,
the
`sanitized_column` is computed using the existing code snippet at lines
236-238, which
preserves spaces and `=` characters in `column_name`. The logger call at
`superset/views/base_api.py:20-25` then emits a message like `Rejected
disallowed field
access: user_id=<id> endpoint=ChartRestApi.related column=foo
endpoint=admin`. This
contradicts the comment promise of a "single, bounded token" and allows the
attacker-controlled `column_name` to introduce an extra `endpoint=admin`
token into the
log line, which can corrupt downstream security-log parsing that relies on
`key=value`
tokens (for example, a parser using `endpoint=([^ ]+)` will read the
attacker-controlled
value instead of the real endpoint).
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4688176516584d4db635880860efb79c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4688176516584d4db635880860efb79c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/views/base_api.py
**Line:** 236:238
**Comment:**
*Security: The sanitization still allows spaces and punctuation like
`=` in `column_name`, so an attacker can send values such as `foo
endpoint=admin` and forge extra key/value tokens in the log line. This breaks
the intended "single token" guarantee and can corrupt security-log parsing.
Restrict the value to a safe token character set (for example alphanumeric plus
`_-.`) or encode it before logging.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41306&comment_hash=7bb1e7ff891c28c3ebd2ede2d33377c063b343dd9316b876e5929e954e9e50c6&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41306&comment_hash=7bb1e7ff891c28c3ebd2ede2d33377c063b343dd9316b876e5929e954e9e50c6&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]