codeant-ai-for-open-source[bot] commented on code in PR #41218:
URL: https://github.com/apache/superset/pull/41218#discussion_r3440344055


##########
superset/security/manager.py:
##########
@@ -520,42 +520,87 @@ def _native_filter_request_modified(query_context: 
"QueryContext") -> bool:
     )
 
 
-def query_context_modified(query_context: "QueryContext") -> bool:
+def _collect_sortable_identifiers(
+    stored_chart: "Slice",
+    stored_query_context: Optional[dict[str, Any]],
+) -> set[str]:
     """
-    Check if a query context has been modified.
-
-    This is used to ensure guest users don't modify the payload and fetch data
-    different from what was shared with them in dashboards.
+    Frozen column names and metric labels/definitions a guest may legitimately
+    sort by: every column or metric the stored chart already references.
+
+    Order-by only changes the ordering of the result, not which data is read, 
so
+    any column or metric already part of the chart is a safe sort target. A 
term
+    that is not present in the stored chart (for example a free-form 
``random()``
+    expression) cannot be validated and must be rejected. Order-by entries are
+    ``(column_or_metric, ascending)`` pairs, so only their first element is
+    collected.
     """
-    form_data = query_context.form_data
-    stored_chart = query_context.slice_
+    allowed: set[str] = set()
 
-    # Native-filter data requests have no associated chart (no slice_id). 
Rather
-    # than accepting any payload, constrain them to the column(s) the 
dashboard's
-    # native filter is allowed to target; other chartless paths keep prior
-    # behavior (see _native_filter_request_modified).
-    if stored_chart is None:
-        return _native_filter_request_modified(query_context)
+    def add(values: Any) -> None:
+        for value in values or []:
+            allowed.add(freeze_value(value))
 
-    if form_data is None:
-        return False
+    def add_orderby(entries: Any) -> None:
+        for entry in entries or []:
+            if isinstance(entry, (list, tuple)) and entry:
+                allowed.add(freeze_value(entry[0]))
 
-    # cannot request a different chart
-    if form_data.get("slice_id") != stored_chart.id:
-        return True
+    params = stored_chart.params_dict
+    for key in ("columns", "groupby", "metrics", "all_columns"):
+        add(params.get(key))
+    add_orderby(params.get("orderby"))
 
-    stored_query_context = (
-        json.loads(cast(str, stored_chart.query_context))
-        if stored_chart.query_context
-        else None
-    )
+    if stored_query_context:
+        for query in stored_query_context.get("queries") or []:
+            for key in ("columns", "groupby", "metrics"):
+                add(query.get(key))
+            add_orderby(query.get("orderby"))
+
+    return allowed
 
-    # compare columns and metrics in form_data with stored values
+
+def _orderby_modified(
+    query_context: "QueryContext",
+    stored_chart: "Slice",
+    stored_query_context: Optional[dict[str, Any]],
+) -> bool:
+    """
+    Whether any order-by clause sorts by a term the stored chart does not 
already
+    reference.
+
+    A guest reordering an embedded chart by one of its existing columns or
+    metrics is legitimate and must not read as tampering; introducing a new
+    expression is not, and is rejected.
+    """
+    allowed = _collect_sortable_identifiers(stored_chart, stored_query_context)
+    form_data = query_context.form_data or {}
+    requested = list(form_data.get("orderby") or [])
+    for query in query_context.queries:
+        requested.extend(getattr(query, "orderby", None) or [])
+
+    for entry in requested:
+        term = entry[0] if isinstance(entry, (list, tuple)) and entry else 
entry
+        if freeze_value(term) not in allowed:
+            return True

Review Comment:
   **Suggestion:** The new order-by check validates only the first element of 
each entry but never validates the entry shape. A malformed payload like 
`orderby=["gender"]` or `orderby=[["gender"]]` can pass this guard (because the 
term is allowed) and then fail later when query code unpacks each order-by item 
as `(column, ascending)`, causing a runtime error for guest requests. Reject 
order-by entries unless they are exactly two-item list/tuple pairs with a 
boolean direction. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Malformed orderby can crash chart compile/query execution.
   - ❌ Guest payload guard fails to reject structurally invalid input.
   - ⚠️ Embedded/ChartData callers can trigger unexpected 500 errors.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Trigger chart compilation via `_compile_chart` in
   `superset/mcp_service/chart/compile.py:44-55` with a form_data payload 
containing a
   malformed order-by clause, e.g. `form_data["orderby"] = ["gender"]` and a 
valid
   metric/column `"gender"` on the chosen dataset.
   
   2. `_compile_chart` builds a `QueryContext` by calling 
`QueryContextFactory().create(...)`
   and passes `queries=[{"columns": columns, "metrics": metrics, "orderby":
   form_data.get("orderby", []), ...}]` (compile.py:42-55) so the malformed 
`["gender"]` list
   becomes the `orderby` field for the first `QueryObject`.
   
   3. `ChartDataCommand.validate()` in
   `superset/commands/chart/data/get_data_command.py:72-73` calls
   `QueryContext.raise_for_access()`, which delegates to
   `security_manager.raise_for_access(..., query_context=query_context)`; in
   `SupersetSecurityManager.raise_for_access` 
(`superset/security/manager.py:3436-3450`)
   guest users invoke `query_context_modified(query_context)`, which calls
   `_orderby_modified` (`superset/security/manager.py:84-107`). Inside 
`_orderby_modified`,
   the loop at lines 582‑585 iterates `entry = "gender"`, computes `term = 
entry` (not a
   list/tuple), finds `freeze_value(term)` in the allowed set from
   `_collect_sortable_identifiers`, and therefore does not flag the malformed 
shape as
   tampering.
   
   4. `ChartDataCommand.run()` then executes the query via 
`QueryContext.get_payload` →
   `QueryContextProcessor.get_df_payload`
   (`superset/common/query_context_processor.py:82-90`) → datasource query 
building, which
   ultimately reaches `superset/models/helpers.py:3170-3172` where `for 
orig_col, ascending
   in orderby:` attempts to unpack each `orderby` element. Because `orderby` 
contains a bare
   string `"gender"` instead of `(column, bool)`, Python raises a `ValueError` 
during
   unpacking, causing a 500 error for this request instead of being rejected 
early as an
   invalid/tampered guest payload.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=83c77c93a5d940e3868c710ba5b07c4f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=83c77c93a5d940e3868c710ba5b07c4f&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/security/manager.py
   **Line:** 582:585
   **Comment:**
        *Api Mismatch: The new order-by check validates only the first element 
of each entry but never validates the entry shape. A malformed payload like 
`orderby=["gender"]` or `orderby=[["gender"]]` can pass this guard (because the 
term is allowed) and then fail later when query code unpacks each order-by item 
as `(column, ascending)`, causing a runtime error for guest requests. Reject 
order-by entries unless they are exactly two-item list/tuple pairs with a 
boolean direction.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41218&comment_hash=0976d53aaaa81935ccfc712039aba97b9b6d9ebbb43bc7e5098e242d2e877adf&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41218&comment_hash=0976d53aaaa81935ccfc712039aba97b9b6d9ebbb43bc7e5098e242d2e877adf&reaction=dislike'>👎</a>



##########
superset/security/manager.py:
##########
@@ -520,42 +520,87 @@ def _native_filter_request_modified(query_context: 
"QueryContext") -> bool:
     )
 
 
-def query_context_modified(query_context: "QueryContext") -> bool:
+def _collect_sortable_identifiers(
+    stored_chart: "Slice",
+    stored_query_context: Optional[dict[str, Any]],
+) -> set[str]:
     """
-    Check if a query context has been modified.
-
-    This is used to ensure guest users don't modify the payload and fetch data
-    different from what was shared with them in dashboards.
+    Frozen column names and metric labels/definitions a guest may legitimately
+    sort by: every column or metric the stored chart already references.
+
+    Order-by only changes the ordering of the result, not which data is read, 
so
+    any column or metric already part of the chart is a safe sort target. A 
term
+    that is not present in the stored chart (for example a free-form 
``random()``
+    expression) cannot be validated and must be rejected. Order-by entries are
+    ``(column_or_metric, ascending)`` pairs, so only their first element is
+    collected.
     """
-    form_data = query_context.form_data
-    stored_chart = query_context.slice_
+    allowed: set[str] = set()
 
-    # Native-filter data requests have no associated chart (no slice_id). 
Rather
-    # than accepting any payload, constrain them to the column(s) the 
dashboard's
-    # native filter is allowed to target; other chartless paths keep prior
-    # behavior (see _native_filter_request_modified).
-    if stored_chart is None:
-        return _native_filter_request_modified(query_context)
+    def add(values: Any) -> None:
+        for value in values or []:
+            allowed.add(freeze_value(value))
 
-    if form_data is None:
-        return False
+    def add_orderby(entries: Any) -> None:
+        for entry in entries or []:
+            if isinstance(entry, (list, tuple)) and entry:
+                allowed.add(freeze_value(entry[0]))
 
-    # cannot request a different chart
-    if form_data.get("slice_id") != stored_chart.id:
-        return True
+    params = stored_chart.params_dict
+    for key in ("columns", "groupby", "metrics", "all_columns"):
+        add(params.get(key))

Review Comment:
   **Suggestion:** Sortable identifiers are collected from 
`columns/groupby/metrics/all_columns` only, but many legacy chart payloads 
store a single metric under `metric` (singular). For charts without stored 
`query_context`, this misses the chart's actual metric and wrongly flags 
legitimate guest sorts by that metric as tampering. Include legacy singular 
metric fields when building the allowed sort target set. [incomplete 
implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Embedded guests blocked on legacy metric-based charts.
   - ⚠️ Legitimate metric sorts misclassified as payload tampering.
   - ⚠️ Inconsistent behavior between legacy and modern chart configs.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Consider a legacy Big Number-style chart whose saved form_data (stored on 
the `Slice`)
   uses the singular `"metric"` field instead of the newer `"metrics"` list; 
this legacy
   behavior is documented in `_compile_chart` in
   `superset/mcp_service/chart/compile.py:33-36`, which explicitly treats
   `form_data["metric"]` as the metric when `metrics` is empty.
   
   2. When this chart is embedded and loaded as a guest, the backend 
reconstructs a
   `QueryContext` (via `QueryContextFactory.create`,
   `superset/common/query_context_factory.py:88-19`), and
   `query_context_modified(query_context)` in 
`superset/security/manager.py:152-195` is
   invoked from `SupersetSecurityManager.raise_for_access` 
(`security/manager.py:3436-3450`)
   to enforce guest payload integrity.
   
   3. Inside `query_context_modified`, 
`_collect_sortable_identifiers(stored_chart,
   stored_query_context)` is called (`security/manager.py:44-81`). For charts 
without a
   stored `query_context` JSON, `stored_query_context` is `None`, so the 
allowed set is built
   only from `stored_chart.params_dict`. The loop at lines 550‑551 iterates 
keys `("columns",
   "groupby", "metrics", "all_columns")` and never reads 
`params_dict["metric"]`, so the
   chart's actual legacy metric is omitted from the `allowed` sort identifiers.
   
   4. If the chart's query uses that metric in `orderby` (for example, Big 
Number's
   `sort_by_metric` logic sets `query_obj["orderby"] = 
[(query_obj["metrics"][0], False)]` in
   `superset/viz.py:1277-1279`), then `_orderby_modified` 
(`security/manager.py:84-107`) will
   see each order-by term's `freeze_value(term)` not present in the `allowed` 
set and return
   `True`. This causes `query_context_modified` to return `True` and
   `SupersetSecurityManager.raise_for_access` to raise a 
`SupersetSecurityException` with
   message "Guest user cannot modify chart payload" 
(`security/manager.py:3436-3450`),
   wrongly blocking legitimate guest views/sorts on such legacy charts.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=e4cb6dfef3e04ef8a46b74e4e9088684&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=e4cb6dfef3e04ef8a46b74e4e9088684&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/security/manager.py
   **Line:** 550:551
   **Comment:**
        *Incomplete Implementation: Sortable identifiers are collected from 
`columns/groupby/metrics/all_columns` only, but many legacy chart payloads 
store a single metric under `metric` (singular). For charts without stored 
`query_context`, this misses the chart's actual metric and wrongly flags 
legitimate guest sorts by that metric as tampering. Include legacy singular 
metric fields when building the allowed sort target set.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41218&comment_hash=912bde1944837a6c7753a34c3c7238f66b708c0a69f8e9623cc59748f360b408&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41218&comment_hash=912bde1944837a6c7753a34c3c7238f66b708c0a69f8e9623cc59748f360b408&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to