codeant-ai-for-open-source[bot] commented on code in PR #41217:
URL: https://github.com/apache/superset/pull/41217#discussion_r3440255928
##########
superset/security/api.py:
##########
@@ -74,7 +74,26 @@ def convert_enum_to_value( # pylint: disable=unused-argument
return data
-class RlsRuleSchema(PermissiveSchema):
+class RlsRuleSchema(Schema):
+ """
+ Schema for a single row-level security rule attached to a guest token.
+
+ Unlike the other guest-token schemas, this one rejects unknown fields
+ instead of silently dropping them. A rule is scoped to a dataset only when
+ it carries a valid integer ``dataset`` key; a rule with no ``dataset`` is
+ treated as global and its ``clause`` is applied to every dataset the
+ embedded resource can reach (see ``get_guest_rls_filters``). Silently
+ excluding an unexpected field -- most commonly a mistyped or legacy scope
+ key such as ``datasource`` -- would therefore turn an intended
+ dataset-scoped rule into a global one without any feedback to the caller.
+ Raising on unknown fields surfaces the mistake as an HTTP 400 before a
+ token is ever issued and keeps the accepted payload aligned with the
+ documented ``RlsRule`` contract (``dataset`` and ``clause``).
+ """
+
+ class Meta: # pylint: disable=too-few-public-methods
+ unknown = RAISE
+
dataset = fields.Integer()
Review Comment:
**Suggestion:** `dataset` is currently validated as any integer, which
allows `0` (and boolean `false`, which deserializes to `0`) to pass schema
validation. Downstream, guest RLS evaluation treats falsy `dataset` values as
"global" (`if not rule.get("dataset")`), so an intended scoped rule can still
be widened to all datasets. Constrain `dataset` to positive IDs (and ideally
strict integer input) so invalid/falsy values are rejected at validation time.
[falsy zero check]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Guest RLS rules unexpectedly apply globally across all datasets.
- ⚠️ Embedded dashboards may leak data from unauthorized datasets.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Hit the guest token endpoint `POST /api/v1/security/guest_token/`
implemented by
`SecurityRestApi.guest_token` in `superset/security/api.py:168-201`, which
deserializes
the JSON body with `guest_token_create_schema.load(request.json)` at
`superset/security/api.py:203`.
2. Provide a payload where an RLS rule has a falsy dataset value, for
example:
`{"resources": [{"type": "dashboard", "id": "abc"}], "rls": [{"dataset":
0, "clause":
"tenant_id = 1"}]}`; this is validated by `GuestTokenCreateSchema` at
`superset/security/api.py:101-116`, whose `rls` field nests
`RlsRuleSchema`, and
`RlsRuleSchema.dataset = fields.Integer()` at
`superset/security/api.py:97`, so `0` (or
`false`, which marshmallow treats as `0`) passes validation and is loaded
unchanged
instead of raising `ValidationError`.
3. The successfully loaded body is passed into
`self.appbuilder.sm.create_guest_access_token(body.get("user", {}),
body["resources"],
body["rls"], ...)` at `superset/security/api.py:220-224`;
`create_guest_access_token` in
`superset/security/manager.py:3872-33` embeds the `rls` list as the
`rls_rules` claim, so
the minted guest JWT now contains an RLS rule `{"dataset": 0, "clause":
"tenant_id = 1"}`.
4. When the embedded guest user later queries any dataset,
`get_guest_user_from_request`
and `get_guest_user_from_token` (see `superset/security/manager.py:47-80`)
reconstruct a
`GuestUser` whose `rls` attribute includes that rule; during query
construction,
`security_manager.get_guest_rls_filters(self)` is called from
`superset/connectors/sqla/models.py:19-26`, which delegates to
`SupersetSecurityManager.get_guest_rls_filters` at
`superset/security/manager.py:13-28`.
That method filters rules with `if not rule.get("dataset") or
str(rule.get("dataset")) ==
str(dataset.data["id"])`; for the rule where `dataset` is `0`, `not
rule.get("dataset")`
evaluates to `True`, so the rule is treated as global and returned for every
dataset
instead of being scoped, causing its `clause` to be applied across all
datasets reachable
by the embedded resource.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=47c05c2bc5bf477fb74347313fdd9044&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=47c05c2bc5bf477fb74347313fdd9044&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/security/api.py
**Line:** 97:97
**Comment:**
*Falsy Zero Check: `dataset` is currently validated as any integer,
which allows `0` (and boolean `false`, which deserializes to `0`) to pass
schema validation. Downstream, guest RLS evaluation treats falsy `dataset`
values as "global" (`if not rule.get("dataset")`), so an intended scoped rule
can still be widened to all datasets. Constrain `dataset` to positive IDs (and
ideally strict integer input) so invalid/falsy values are rejected at
validation time.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41217&comment_hash=4219fbb8f8f96d60dbbc33aff0615867cfa6f23bf7cd1127d25e42ebd40441d9&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41217&comment_hash=4219fbb8f8f96d60dbbc33aff0615867cfa6f23bf7cd1127d25e42ebd40441d9&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]