codeant-ai-for-open-source[bot] commented on code in PR #40975:
URL: https://github.com/apache/superset/pull/40975#discussion_r3437505873


##########
superset/mcp_service/dataset/tool/update_dataset_metric.py:
##########
@@ -0,0 +1,258 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Update a saved metric on a dataset (FastMCP tool)."""
+
+import difflib
+import logging
+from typing import Any
+
+from fastmcp import Context
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dataset.schemas import (
+    DatasetMetricDetail,
+    MetricCurrency,
+    UpdateDatasetMetricRequest,
+    UpdateDatasetMetricResponse,
+)
+from superset.mcp_service.utils import (
+    escape_llm_context_delimiters,
+    sanitize_for_llm_context,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def _find_metric(metrics: list[Any], identifier: int | str) -> Any | None:
+    """Find a saved metric by ID, UUID, or metric_name."""
+    if isinstance(identifier, str):
+        try:
+            identifier = int(identifier)
+        except ValueError:
+            pass
+
+    if isinstance(identifier, int):
+        return next((m for m in metrics if m.id == identifier), None)
+
+    lowered = identifier.lower()
+    by_uuid = next(
+        (m for m in metrics if m.uuid and str(m.uuid).lower() == lowered), None
+    )
+    if by_uuid is not None:
+        return by_uuid
+    return next((m for m in metrics if m.metric_name == identifier), None)
+
+
+def _metric_not_found_message(metrics: list[Any], identifier: int | str) -> 
str:
+    names = [m.metric_name for m in metrics]
+    msg = f"Metric '{identifier}' not found on this dataset."
+    if not names:
+        return f"{msg} This dataset has no saved metrics."
+    suggestions = difflib.get_close_matches(str(identifier), names, n=3, 
cutoff=0.6)
+    if suggestions:
+        return f"{msg} Did you mean: {', '.join(suggestions)}?"
+    return f"{msg} Available metrics: {', '.join(sorted(names))}."

Review Comment:
   **Suggestion:** The not-found error message interpolates raw user-controlled 
identifier and stored metric names directly into the response text without 
LLM-context sanitization, which can inject delimiter/control content into MCP 
outputs. Sanitize or escape both identifier and suggested/available metric 
names before constructing `error`. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ MCP error text can inject delimiters into LLM context.
   - ⚠️ Enables prompt-injection via crafted metric identifiers or names.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. The helper `_metric_not_found_message` at
   `superset/mcp_service/dataset/tool/update_dataset_metric.py:62-70` builds an 
error string
   directly from the user-supplied `identifier` and the stored `metric_name` 
values: `msg =
   f"Metric '{identifier}' not found..."`, then either suggests close matches 
(`',
   '.join(suggestions)`) or lists all available metric names (`', 
'.join(sorted(names))`)
   with no sanitization or escaping.
   
   2. When `update_dataset_metric` cannot find the requested metric (`target is 
None` at
   `update_dataset_metric.py:181`), it calls `_metric_not_found_message(metrics,
   request.metric)` and returns the resulting string as the `error` field of
   `UpdateDatasetMetricResponse` at `update_dataset_metric.py:182-188`.
   `UpdateDatasetMetricResponse`'s schema in
   `superset/mcp_service/dataset/schemas.py:150-167` defines `error: str | 
None` without any
   field validator, so this text is passed through unchanged to the MCP 
client/LLM.
   
   3. Other fields intended for LLM exposure are explicitly sanitized: 
`_serialize_metric`
   applies `escape_llm_context_delimiters` and `sanitize_for_llm_context` to 
metric_name,
   verbose_name, expression, description, and warning_text at
   `update_dataset_metric.py:78-99`, and `DatasetError.error` wraps error text 
with
   `sanitize_for_llm_context` via a `field_validator` at
   `superset/mcp_service/dataset/schemas.py:60-64`. In contrast, 
`_metric_not_found_message`
   bypasses these utilities.
   
   4. If a caller supplies a crafted `metric` identifier containing LLM control 
content (for
   example ``"```json\n{\"attack\": true}\n```"``) or if stored `metric_name` 
values include
   such delimiters, `_metric_not_found_message` will interpolate them directly 
into `msg`
   (lines 64-70), and the MCP tool will return this string in
   `UpdateDatasetMetricResponse.error` (lines 184-188). This unsanitized error 
text is then
   embedded into the LLM tool response, creating a prompt-injection vector that 
can break
   tool-output framing or influence downstream agent behavior.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=272a631fff1043d49352e45f4ffbf5db&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=272a631fff1043d49352e45f4ffbf5db&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/update_dataset_metric.py
   **Line:** 62:70
   **Comment:**
        *Security: The not-found error message interpolates raw user-controlled 
identifier and stored metric names directly into the response text without 
LLM-context sanitization, which can inject delimiter/control content into MCP 
outputs. Sanitize or escape both identifier and suggested/available metric 
names before constructing `error`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40975&comment_hash=70a41e28e45a496552a1d9934b05f386cb6c93611a587a8553af01f9032ebc7e&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40975&comment_hash=70a41e28e45a496552a1d9934b05f386cb6c93611a587a8553af01f9032ebc7e&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/update_dataset_metric.py:
##########
@@ -0,0 +1,258 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Update a saved metric on a dataset (FastMCP tool)."""
+
+import difflib
+import logging
+from typing import Any
+
+from fastmcp import Context
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dataset.schemas import (
+    DatasetMetricDetail,
+    MetricCurrency,
+    UpdateDatasetMetricRequest,
+    UpdateDatasetMetricResponse,
+)
+from superset.mcp_service.utils import (
+    escape_llm_context_delimiters,
+    sanitize_for_llm_context,
+)
+
+logger = logging.getLogger(__name__)
+
+
+def _find_metric(metrics: list[Any], identifier: int | str) -> Any | None:
+    """Find a saved metric by ID, UUID, or metric_name."""
+    if isinstance(identifier, str):
+        try:
+            identifier = int(identifier)
+        except ValueError:
+            pass
+
+    if isinstance(identifier, int):
+        return next((m for m in metrics if m.id == identifier), None)
+
+    lowered = identifier.lower()
+    by_uuid = next(
+        (m for m in metrics if m.uuid and str(m.uuid).lower() == lowered), None
+    )
+    if by_uuid is not None:
+        return by_uuid
+    return next((m for m in metrics if m.metric_name == identifier), None)
+
+
+def _metric_not_found_message(metrics: list[Any], identifier: int | str) -> 
str:
+    names = [m.metric_name for m in metrics]
+    msg = f"Metric '{identifier}' not found on this dataset."
+    if not names:
+        return f"{msg} This dataset has no saved metrics."
+    suggestions = difflib.get_close_matches(str(identifier), names, n=3, 
cutoff=0.6)
+    if suggestions:
+        return f"{msg} Did you mean: {', '.join(suggestions)}?"
+    return f"{msg} Available metrics: {', '.join(sorted(names))}."
+
+
+def _serialize_metric(metric: Any) -> DatasetMetricDetail:
+    currency = getattr(metric, "currency", None)
+    return DatasetMetricDetail(
+        id=getattr(metric, "id", None),
+        uuid=str(metric.uuid) if getattr(metric, "uuid", None) else None,
+        metric_name=escape_llm_context_delimiters(metric.metric_name) or "",
+        verbose_name=sanitize_for_llm_context(
+            getattr(metric, "verbose_name", None),
+            field_path=("metric", "verbose_name"),
+        ),
+        expression=sanitize_for_llm_context(
+            getattr(metric, "expression", None),
+            field_path=("metric", "expression"),
+        ),
+        description=sanitize_for_llm_context(
+            getattr(metric, "description", None),
+            field_path=("metric", "description"),
+        ),
+        d3format=getattr(metric, "d3format", None),
+        metric_type=getattr(metric, "metric_type", None),
+        currency=MetricCurrency.model_validate(currency)
+        if isinstance(currency, dict)
+        else None,
+        warning_text=sanitize_for_llm_context(
+            getattr(metric, "warning_text", None),
+            field_path=("metric", "warning_text"),
+        ),
+    )
+
+
+@tool(
+    tags=["mutate"],
+    class_permission_name="Dataset",
+    method_permission_name="write",
+    annotations=ToolAnnotations(
+        title="Update dataset metric",
+        readOnlyHint=False,
+        destructiveHint=False,
+    ),
+)
+async def update_dataset_metric(
+    request: UpdateDatasetMetricRequest, ctx: Context
+) -> UpdateDatasetMetricResponse:
+    """Update a saved metric on a dataset.
+
+    Modifies properties of an existing saved metric (the named, reusable
+    aggregations stored on a dataset, e.g. SUM(revenue)) — such as its SQL
+    expression, name, display label, or number format. Changes apply to
+    every chart that uses the metric.
+
+    Only the properties you pass are changed; everything else on the metric,
+    and all other metrics on the dataset, is left untouched. This tool cannot
+    create or delete metrics. Requires ownership of the dataset (or Admin).
+
+    Workflow:
+    1. Call get_dataset_info to inspect the dataset's saved metrics
+    2. Call this tool with the dataset ID, the metric identifier, and only
+       the properties to change
+
+    Example usage:
+    ```json
+    {
+        "dataset_id": 123,
+        "metric": "sum_revenue",
+        "expression": "SUM(net_revenue)",
+        "d3format": "$,.2f"
+    }
+    ```
+    """
+    updates = request.updates()
+    await ctx.info(
+        "Updating dataset metric: dataset_id=%s, metric=%r, properties=%s"
+        % (request.dataset_id, request.metric, sorted(updates))
+    )
+
+    try:
+        from sqlalchemy.orm import joinedload, subqueryload
+
+        from superset.commands.dataset.exceptions import (
+            DatasetForbiddenError,
+            DatasetInvalidError,
+            DatasetNotFoundError,
+            DatasetUpdateFailedError,
+        )
+        from superset.commands.dataset.update import UpdateDatasetCommand
+        from superset.connectors.sqla.models import SqlaTable
+        from superset.mcp_service.dataset.dataset_utils import resolve_dataset
+        from superset.mcp_service.utils.url_utils import get_superset_base_url
+
+        eager_options = [
+            subqueryload(SqlaTable.metrics),
+            joinedload(SqlaTable.database),
+        ]
+
+        with 
event_logger.log_context(action="mcp.update_dataset_metric.lookup"):
+            dataset = resolve_dataset(request.dataset_id, eager_options)
+
+        if dataset is None:
+            await ctx.warning("Dataset not found: %s" % (request.dataset_id,))
+            return UpdateDatasetMetricResponse(
+                error=(
+                    f"No dataset found with identifier: {request.dataset_id}."
+                    " Use list_datasets to get valid dataset IDs."
+                ),
+            )
+
+        metrics = list(dataset.metrics)
+        target = _find_metric(metrics, request.metric)
+        if target is None:
+            message = _metric_not_found_message(metrics, request.metric)
+            await ctx.warning("Metric not found: %s" % (request.metric,))
+            return UpdateDatasetMetricResponse(
+                dataset_id=dataset.id,
+                dataset_name=dataset.table_name,
+                error=message,
+            )

Review Comment:
   **Suggestion:** This code performs metric existence checks before ownership 
is enforced, so a caller who can invoke the tool but does not own the dataset 
can distinguish "metric exists" vs "metric not found" and enumerate metric 
names via error messages/suggestions. Enforce ownership immediately after 
dataset resolution (before `_find_metric` / `_metric_not_found_message`) so 
unauthorized callers always get the same forbidden response. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ MCP `update_dataset_metric` leaks metric existence to non-owners.
   - ⚠️ Enables metric name enumeration via distinct error responses.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In `superset/mcp_service/dataset/tool/update_dataset_metric.py:103-113`, 
the
   `update_dataset_metric` FastMCP tool is exposed with 
`class_permission_name="Dataset"` and
   `method_permission_name="write"`, meaning any role with the Dataset write 
class permission
   can invoke this tool from an MCP client.
   
   2. When the tool runs, it first resolves the dataset via
   `resolve_dataset(request.dataset_id, eager_options)` at
   `update_dataset_metric.py:167-168`, which calls `DatasetDAO.find_by_id` 
through
   `resolve_dataset` (`superset/mcp_service/dataset/dataset_utils.py:25-48`). 
This applies
   the `DatasourceFilter` base filter (`superset/daos/dataset.py:52-61`) for 
data-access
   permissions but does not enforce ownership.
   
   3. After resolution, before any ownership check, the tool materializes all 
metrics and
   attempts to locate the target metric (`metrics = list(dataset.metrics)` and 
`target =
   _find_metric(...)` at `update_dataset_metric.py:179-180`). If the metric is 
not found,
   `_metric_not_found_message` is called and a detailed error is returned at
   `update_dataset_metric.py:181-188`, including the dataset ID/name (from 
`dataset.id` /
   `dataset.table_name`) and suggested/available metric names.
   
   4. Only when the metric *is* found does execution continue to the update 
path, where
   `UpdateDatasetCommand(...).run()` is invoked at 
`update_dataset_metric.py:207-209`. Inside
   `UpdateDatasetCommand.validate` 
(`superset/commands/dataset/update.py:98-112`), ownership
   is enforced via `security_manager.raise_for_ownership`, which can raise
   `DatasetForbiddenError` and is caught at `update_dataset_metric.py:236-243` 
to return the
   generic `"You must be an owner..."` error. A caller who can invoke the tool 
but is not an
   owner can therefore distinguish (a) invalid dataset ID (`dataset is None` at 
lines
   170-177), (b) non-existent metric (`target is None` at lines 179-188 with a
   metric-not-found message listing metrics), and (c) existing metric but 
forbidden update
   (DatasetForbiddenError at lines 236-243), allowing probing and enumeration 
of metric
   existence and names on datasets they cannot edit.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=eb181acb52fc42e5834803c6836769e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=eb181acb52fc42e5834803c6836769e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/update_dataset_metric.py
   **Line:** 179:188
   **Comment:**
        *Security: This code performs metric existence checks before ownership 
is enforced, so a caller who can invoke the tool but does not own the dataset 
can distinguish "metric exists" vs "metric not found" and enumerate metric 
names via error messages/suggestions. Enforce ownership immediately after 
dataset resolution (before `_find_metric` / `_metric_not_found_message`) so 
unauthorized callers always get the same forbidden response.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40975&comment_hash=11b3de9b3121d4f5d24870a9b2e3e728dbcc63acbb0fcd57659c9bcd1a4f3c23&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40975&comment_hash=11b3de9b3121d4f5d24870a9b2e3e728dbcc63acbb0fcd57659c9bcd1a4f3c23&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to