codeant-ai-for-open-source[bot] commented on code in PR #41306:
URL: https://github.com/apache/superset/pull/41306#discussion_r3456371013


##########
superset/views/base_api.py:
##########
@@ -223,6 +223,27 @@ def incr_stats(self, action: str, func_name: str) -> None:
             f"{self.__class__.__name__}.{func_name}.{action}"
         )
 
+    def log_rejected_field_access(self, func_name: str, column_name: str) -> 
None:
+        """Emit a security log event when a related/distinct field is rejected.
+
+        The allowlist check itself blocks the request; this records the attempt
+        in the structured log (alongside the existing statsd counter) so that
+        rejected field access is visible to security monitoring and forensics,
+        with the caller's identity, the endpoint, and the attempted value.
+        """
+        # Sanitize the user-supplied column name to a single, bounded token so
+        # it cannot inject newlines or forge extra log lines.
+        sanitized_column = "".join(
+            ch for ch in str(column_name) if ch.isprintable() and ch not in 
"\r\n"
+        )[:200]

Review Comment:
   **Suggestion:** The sanitization still allows spaces and punctuation like 
`=` in `column_name`, so an attacker can send values such as `foo 
endpoint=admin` and forge extra key/value tokens in the log line. This breaks 
the intended "single token" guarantee and can corrupt security-log parsing. 
Restrict the value to a safe token character set (for example alphanumeric plus 
`_-.`) or encode it before logging. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Related endpoint logs can contain attacker-crafted endpoint tokens.
   - ⚠️ Distinct endpoint security logs may mislead incident investigations.
   - ⚠️ `test_get_related_fail` asserts logging but not token safety.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start Superset with this PR code so that `BaseSupersetModelRestApi` in
   `superset/views/base_api.py` is used by its subclasses (for example 
`ChartRestApi` in
   `superset/charts/api.py:11-27`, which includes `RouteMethod.RELATED` and 
therefore exposes
   the `/related/<column_name>` endpoint).
   
   2. As an authenticated user (mirroring
   
`tests/integration_tests/base_api_tests.BaseApiOwnersTestCaseMixin.test_get_related_fail`
   at `tests/integration_tests/base_api_tests.py:10-18`), issue a GET request 
to a related
   endpoint such as `GET /api/v1/chart/related/foo endpoint=admin` where 
`column_name` in the
   URL path is the literal string `foo endpoint=admin` (including the space and 
`=`).
   
   3. The request is handled by `BaseSupersetModelRestApi.related` in
   `superset/views/base_api.py:11-25`, which checks `if column_name not in
   self.allowed_rel_fields:` at `superset/views/base_api.py:22`. Because `"foo
   endpoint=admin"` is not in the allowlist, the method calls
   `self.log_rejected_field_access(self.related.__name__, column_name)` at
   `superset/views/base_api.py:24`.
   
   4. Inside `log_rejected_field_access` at `superset/views/base_api.py:7-25`, 
the
   `sanitized_column` is computed using the existing code snippet at lines 
236-238, which
   preserves spaces and `=` characters in `column_name`. The logger call at
   `superset/views/base_api.py:20-25` then emits a message like `Rejected 
disallowed field
   access: user_id=<id> endpoint=ChartRestApi.related column=foo 
endpoint=admin`. This
   contradicts the comment promise of a "single, bounded token" and allows the
   attacker-controlled `column_name` to introduce an extra `endpoint=admin` 
token into the
   log line, which can corrupt downstream security-log parsing that relies on 
`key=value`
   tokens (for example, a parser using `endpoint=([^ ]+)` will read the 
attacker-controlled
   value instead of the real endpoint).
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4688176516584d4db635880860efb79c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4688176516584d4db635880860efb79c&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/views/base_api.py
   **Line:** 236:238
   **Comment:**
        *Security: The sanitization still allows spaces and punctuation like 
`=` in `column_name`, so an attacker can send values such as `foo 
endpoint=admin` and forge extra key/value tokens in the log line. This breaks 
the intended "single token" guarantee and can corrupt security-log parsing. 
Restrict the value to a safe token character set (for example alphanumeric plus 
`_-.`) or encode it before logging.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41306&comment_hash=7bb1e7ff891c28c3ebd2ede2d33377c063b343dd9316b876e5929e954e9e50c6&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41306&comment_hash=7bb1e7ff891c28c3ebd2ede2d33377c063b343dd9316b876e5929e954e9e50c6&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to