codeant-ai-for-open-source[bot] commented on code in PR #40646:
URL: https://github.com/apache/superset/pull/40646#discussion_r3337894066
##########
superset/mcp_service/middleware.py:
##########
@@ -511,7 +511,7 @@ async def on_list_tools(
try:
user = get_user_from_request()
except ValueError as exc:
- if "No authenticated user found" in str(exc):
+ if "Authentication required" in str(exc):
Review Comment:
**Suggestion:** The fail-open branch is keyed off a substring match on
exception text, which is attacker-influenced in some `ValueError` paths (for
example, JWT-derived usernames included in "user not found" messages). A
crafted credential error containing this phrase will be misclassified as "no
auth configured" and incorrectly return the full tools list. Replace this
message-substring check with a stable signal (dedicated exception type,
structured error code, or exact sentinel constant from the auth layer) so
invalid credentials always stay fail-closed. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Invalid JWT user can still see full tools/list output.
- ⚠️ Unauthorized actors can enumerate MCP tools and metadata.
- ⚠️ Violates intended fail-closed behavior for bad credentials.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP server so that the middleware stack built in
`superset/mcp_service/server.py:699-711` is active; this stack includes
`RBACToolVisibilityMiddleware`, which filters every `tools/list` request.
2. Enable JWT auth so `_resolve_user_from_jwt_context()` is used by
`get_user_from_request()` (`superset/mcp_service/auth.py:259-55`), and issue
an access
token whose username claim (e.g., `preferred_username` or `sub`) is the
string
`"Authentication required. No valid credentials provided."` — this value is
returned
unchanged by `default_user_resolver()` in
`superset/mcp_service/mcp_config.py:8-27`.
3. Ensure there is no Superset user with that username so that
`_resolve_user_from_jwt_context()` raises a `ValueError` at
`superset/mcp_service/auth.py:307-311` with the message `JWT authenticated
user
'Authentication required. No valid credentials provided.' not found in
Superset database.
Ensure the user exists before granting MCP access.`; note that the
attacker-controlled
username is interpolated into the error string.
4. From an MCP client, call `tools/list` using this JWT; the request reaches
`RBACToolVisibilityMiddleware.on_list_tools()` at
`superset/mcp_service/middleware.py:498-523`, where
`get_user_from_request()` raises the
`ValueError`, `except ValueError as exc:` is entered, and the substring
check `if
"Authentication required" in str(exc):` at line 514 evaluates True because
the injected
username contains that phrase. The middleware incorrectly treats this as the
"no auth
configured" fail-open case and returns the full `tools` list instead of
failing closed
(returning `[]`) for invalid credentials.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e1dd35732eaf480a96103b5d4b6d8211&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e1dd35732eaf480a96103b5d4b6d8211&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/middleware.py
**Line:** 514:514
**Comment:**
*Security: The fail-open branch is keyed off a substring match on
exception text, which is attacker-influenced in some `ValueError` paths (for
example, JWT-derived usernames included in "user not found" messages). A
crafted credential error containing this phrase will be misclassified as "no
auth configured" and incorrectly return the full tools list. Replace this
message-substring check with a stable signal (dedicated exception type,
structured error code, or exact sentinel constant from the auth layer) so
invalid credentials always stay fail-closed.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=65cdfd91479f2ff29ef81308aa971d3b3cb848029779dd07948110d2d1fe8187&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=65cdfd91479f2ff29ef81308aa971d3b3cb848029779dd07948110d2d1fe8187&reaction=dislike'>👎</a>
##########
superset/mcp_service/auth.py:
##########
@@ -421,29 +421,28 @@ def get_user_from_request() -> User:
if hasattr(g, "user") and g.user:
return g.user
- # No auth source available — raise with diagnostic details
+ # No auth source available. Keep the client-facing message generic so it
+ # does not disclose server configuration; the detailed diagnostics are
+ # logged server-side only.
auth_enabled = current_app.config.get("MCP_AUTH_ENABLED", False)
jwt_configured = bool(
current_app.config.get("MCP_JWKS_URI")
or current_app.config.get("MCP_JWT_PUBLIC_KEY")
or current_app.config.get("MCP_JWT_SECRET")
)
- details = [
- f"No JWT access token in MCP request context "
- f"(MCP_AUTH_ENABLED={auth_enabled}, "
- f"JWT keys configured={jwt_configured})",
- "No API key in Authorization header",
- "MCP_DEV_USERNAME is not configured",
- "g.user was not set by external middleware",
- ]
- configured_prefixes = current_app.config.get("FAB_API_KEY_PREFIXES",
["sst_"])
- prefix_example = configured_prefixes[0] if configured_prefixes else "sst_"
- raise ValueError(
- "No authenticated user found. Tried:\n"
- + "\n".join(f" - {d}" for d in details)
- + f"\n\nEither pass a valid API key (Bearer {prefix_example}...), "
- "JWT token, or configure MCP_DEV_USERNAME for development."
+ dev_username_configured = bool(current_app.config.get("MCP_DEV_USERNAME"))
+ logger.warning(
+ "MCP authentication failed: no valid credentials provided "
+ "(no JWT access token, no API key, no g.user from middleware)"
)
Review Comment:
**Suggestion:** This logs expected unauthenticated requests at WARNING,
which conflicts with the existing fail-open/no-noise flow for `tools/list` and
will produce noisy warning spam in normal dev/internal deployments. Lower this
to DEBUG (or gate WARNING to true credential failures only) so only real auth
failures trigger warning-level alerts. [incorrect log level]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ tools/list RBAC middleware emits spurious auth warning logs.
- ⚠️ Dev/internal MCP deployments see noisy warning-level auth spam.
- ⚠️ Alerting on WARNING may fire on normal unauthenticated traffic.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP server with no auth sources configured: `MCP_AUTH_ENABLED`
False or
unset, no `MCP_JWKS_URI`/`MCP_JWT_PUBLIC_KEY`/`MCP_JWT_SECRET`, no
`MCP_DEV_USERNAME`, and
no external middleware setting `g.user` (affects `get_user_from_request()` at
`superset/mcp_service/auth.py:379`–:96).
2. From an MCP client (e.g. Claude, Cursor), trigger a `tools/list` call,
which hits
`RBACToolVisibilityMiddleware.on_list_tools()` in
`superset/mcp_service/middleware.py:19`–:57`. This method calls
`get_user_from_request()`
at `middleware.py:33`.
3. Inside `get_user_from_request()` (auth.py:53–96), JWT and API-key
resolvers return
`None`, `MCP_DEV_USERNAME` is unset, and `g.user` is absent, so execution
reaches the
fallback at auth.py:75–96. There it executes the added `logger.warning("MCP
authentication
failed: no valid credentials provided ...")` at approximately `auth.py:85`
before raising
`ValueError("Authentication required. No valid credentials provided.")`.
4. Back in `RBACToolVisibilityMiddleware.on_list_tools()`, the `ValueError`
is caught at
`middleware.py:34`–:45`. Because the message contains `"Authentication
required"`, the
branch at `middleware.py:35`–:38` treats this as an expected
unauthenticated/dev scenario
and returns the original tool list **without logging anything**. However,
the warning from
step 3 has already been emitted, so every unauthenticated `tools/list` call
in
dev/internal deployments produces a WARNING-level log entry despite being an
expected,
fail-open path.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=abe2bc4c53644fa99cf747124acf7851&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=abe2bc4c53644fa99cf747124acf7851&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/auth.py
**Line:** 434:437
**Comment:**
*Incorrect Log Level: This logs expected unauthenticated requests at
WARNING, which conflicts with the existing fail-open/no-noise flow for
`tools/list` and will produce noisy warning spam in normal dev/internal
deployments. Lower this to DEBUG (or gate WARNING to true credential failures
only) so only real auth failures trigger warning-level alerts.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=68b335be2d9a07c987bde283a4a33fc818102a153d14465b7d4acb8dd92b2cfe&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=68b335be2d9a07c987bde283a4a33fc818102a153d14465b7d4acb8dd92b2cfe&reaction=dislike'>👎</a>
##########
superset/mcp_service/utils/error_sanitization.py:
##########
@@ -63,8 +66,24 @@ def _get_generic_error_message(error_str: str) -> str | None:
return None
-def _sanitize_validation_error(error: Exception) -> str:
- """SECURITY FIX: Sanitize validation errors to prevent disclosure."""
+def _sanitize_validation_error(error: Exception, log_original: bool = True) ->
str:
+ """SECURITY FIX: Sanitize validation errors to prevent disclosure.
+
+ Args:
+ error: The original exception to sanitize for client-facing output.
+ log_original: When True (default), log the original (unsanitized)
+ error server-side at INFO level before returning the sanitized
+ version. This preserves full diagnostics for operators while the
+ client only ever receives the sanitized message. Set to False to
+ suppress the server-side log (e.g. when the caller already logged).
+ """
+ if log_original:
+ logger.info(
+ "Sanitizing validation error (%s): %s",
+ type(error).__name__,
+ error,
+ )
Review Comment:
**Suggestion:** The new server-side logging writes the raw exception text
before any escaping, so attacker-controlled values containing newlines/tabs can
inject forged log lines and break log integrity. Sanitize control characters
(like the JWT log sanitizer does) before logging the original message, or log a
structured/escaped representation. [security]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Validation pipeline logs attacker-influenced errors without escaping.
- ⚠️ generate_chart tool errors can inject multi-line messages into logs.
- ⚠️ Log-line structure for MCP validation errors can be corrupted.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Run the chart validation pipeline through the MCP generate-chart tool: an
MCP client
calls `generate_chart` in
`superset/mcp_service/chart/tool/generate_chart.py:98`–:169`,
which invokes
`ValidationPipeline.validate_request_with_warnings(request.model_dump())` at
`generate_chart.py:16`–:18`.
2. Inside `ValidationPipeline.validate_request_with_warnings()`
(`superset/mcp_service/chart/validation/pipeline.py:31`–:101`), if any
unexpected
exception is raised in schema/dataset/runtime validation or normalization,
the `except
Exception as e:` block at `pipeline.py:87`–:101` is entered. This block
first logs
`logger.exception("Validation pipeline error")` and then calls
`_sanitize_validation_error(e)` at `pipeline.py:93`–:94`.
3. `_sanitize_validation_error()` in
`superset/mcp_service/utils/error_sanitization.py:69`–:128` is invoked with
its default
`log_original=True`. Before any sanitization, it executes the new logging
block at lines
80–85: `logger.info("Sanitizing validation error (%s): %s",
type(error).__name__, error)`,
which interpolates `str(error)` directly into the log message without
escaping control
characters.
4. The unit test `test_sanitize_logs_original_by_default` in
`superset/tests/unit_tests/mcp_service/utils/test_error_sanitization.py:13`–:29`
confirms
this behavior: it calls `_sanitize_validation_error(ValueError("Invalid
reference to table
secret_revenue in query"))`, then inspects `caplog.records` and asserts that
the INFO log
message contains the raw substring `"secret_revenue"`. If the `error`
message instead
contained attacker-controlled newlines or tabs (e.g. from a Pydantic or
database error
that embeds user-supplied text), those control characters would also be
logged unescaped
at INFO level, allowing crafted exception text to inject extra lines or
confuse
line-oriented log parsers.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=0a7136850c6948f883031b0f92b2098e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=0a7136850c6948f883031b0f92b2098e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/utils/error_sanitization.py
**Line:** 80:85
**Comment:**
*Security: The new server-side logging writes the raw exception text
before any escaping, so attacker-controlled values containing newlines/tabs can
inject forged log lines and break log integrity. Sanitize control characters
(like the JWT log sanitizer does) before logging the original message, or log a
structured/escaped representation.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=c8f3ec967e4d83f25e87218e93d2e6bd71af67ae10a152f4e04e0dd895fd0576&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40646&comment_hash=c8f3ec967e4d83f25e87218e93d2e6bd71af67ae10a152f4e04e0dd895fd0576&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]