codeant-ai-for-open-source[bot] commented on code in PR #40959:
URL: https://github.com/apache/superset/pull/40959#discussion_r3410570192


##########
superset/mcp_service/dashboard/tool/duplicate_dashboard.py:
##########
@@ -0,0 +1,277 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP tool: duplicate_dashboard
+
+Duplicates an existing dashboard, optionally deep-copying its charts.
+Canonical workflow: clone a template dashboard, then edit the copy
+(e.g. to create a regional or staging variant).
+"""
+
+import logging
+from typing import Any
+
+from fastmcp import Context
+from sqlalchemy.exc import SQLAlchemyError
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dashboard.schemas import (
+    DashboardInfo,
+    DuplicateDashboardRequest,
+    DuplicateDashboardResponse,
+    serialize_chart_summary,
+)
+from superset.mcp_service.privacy import user_can_view_data_model_metadata
+from superset.mcp_service.utils.url_utils import get_superset_base_url
+from superset.utils import json
+
+logger = logging.getLogger(__name__)
+
+
+def _build_copy_payload(
+    source: Any, dashboard_title: str, duplicate_slices: bool
+) -> dict[str, Any]:
+    """Build the data payload expected by ``CopyDashboardCommand``.
+
+    Mirrors what the frontend "Save as" flow sends to the
+    ``/api/v1/dashboard/<id>/copy/`` endpoint: the source dashboard's
+    current ``json_metadata`` with a ``positions`` key holding the current
+    layout (``position_json``). ``DashboardCopySchema`` requires
+    ``json_metadata``, and ``DashboardDAO.copy_dashboard`` reads
+    ``positions`` from it to remap chart IDs when ``duplicate_slices``
+    is enabled.
+    """
+    try:
+        metadata = json.loads(source.json_metadata or "{}")
+    except (json.JSONDecodeError, TypeError):
+        metadata = {}
+    if not isinstance(metadata, dict):
+        metadata = {}
+
+    try:
+        positions = json.loads(source.position_json or "{}")
+    except (json.JSONDecodeError, TypeError):
+        positions = {}
+    if not isinstance(positions, dict):
+        positions = {}
+
+    metadata["positions"] = positions
+
+    return {
+        "dashboard_title": dashboard_title,
+        "css": source.css,
+        "duplicate_slices": duplicate_slices,
+        "json_metadata": json.dumps(metadata),
+    }
+
+
+def _serialize_new_dashboard(dashboard: Any) -> tuple[DashboardInfo, str]:
+    """Build the response ``DashboardInfo`` and URL for the new dashboard."""
+    from superset.mcp_service.dashboard.schemas import serialize_tag_object
+
+    dashboard_url = 
f"{get_superset_base_url()}/superset/dashboard/{dashboard.id}/"
+    include_data_model_metadata = user_can_view_data_model_metadata()
+    info = DashboardInfo(
+        id=dashboard.id,
+        dashboard_title=dashboard.dashboard_title,
+        slug=dashboard.slug,
+        description=dashboard.description,
+        published=dashboard.published,
+        created_on=dashboard.created_on,
+        changed_on=dashboard.changed_on,
+        uuid=str(dashboard.uuid) if dashboard.uuid else None,
+        url=dashboard_url,
+        chart_count=len(dashboard.slices),
+        tags=[
+            obj
+            for tag in getattr(dashboard, "tags", [])
+            if (obj := serialize_tag_object(tag)) is not None
+        ],
+        charts=[
+            obj
+            for chart in getattr(dashboard, "slices", [])
+            if (
+                obj := serialize_chart_summary(
+                    chart,
+                    include_data_model_metadata=include_data_model_metadata,
+                )
+            )
+            is not None
+        ],
+    )

Review Comment:
   **Suggestion:** The tool returns dashboard, chart, and tag text fields 
without LLM-context sanitization, unlike the standard dashboard serializers. 
Since these fields are user-controlled, this can surface prompt-injection 
payloads directly in MCP structured responses. Sanitize the response payload 
(or reuse the existing sanitized dashboard serializer path) before returning 
it. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ MCP duplicate_dashboard returns unsanitized dashboard text to LLM.
   - ❌ User dashboards can inject prompts into MCP tool output.
   - ⚠️ Inconsistent with sanitized dashboard serializers in schemas module.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In Superset, create or edit a dashboard so that `dashboard_title` or 
`description` on
   the `Dashboard` model (`superset/models/dashboard.py:131-143`) contains a 
prompt-injection
   payload (for example, instructions to the LLM), and add charts and tags whose
   `slice_name`/`description` and `Tag.name`/`Tag.description` also contain 
crafted text.
   
   2. From an MCP client, call the `duplicate_dashboard` tool implemented in
   `superset/mcp_service/dashboard/tool/duplicate_dashboard.py:131-144`, 
passing this
   dashboard's identifier in `DuplicateDashboardRequest.dashboard_id`; the tool 
copies the
   dashboard and then calls `_serialize_new_dashboard(new_dashboard)` at
   `duplicate_dashboard.py:212`.
   
   3. `_serialize_new_dashboard` at `duplicate_dashboard.py:84-118` builds a 
`DashboardInfo`
   instance directly from ORM attributes: `dashboard_title`, `slug`, 
`description`, `css`,
   and iterates `dashboard.tags` using `serialize_tag_object`
   (`superset/mcp_service/dashboard/schemas.py:137-147`) and `dashboard.slices` 
using
   `serialize_chart_summary` 
(`superset/mcp_service/dashboard/schemas.py:5-29`), all without
   passing the data through `_sanitize_dashboard_info_for_llm_context`.
   
   4. The populated `DashboardInfo` is returned to the MCP client inside
   `DuplicateDashboardResponse` at `duplicate_dashboard.py:235-240` with 
`dashboard=info`;
   unlike other dashboard serializers such as `dashboard_serializer` and
   `serialize_dashboard_object` in 
`superset/mcp_service/dashboard/schemas.py:41-120`, which
   wrap their `DashboardInfo` via `_sanitize_dashboard_info_for_llm_context` 
(lines 43-38 in
   that file) to call `sanitize_for_llm_context` and 
`escape_llm_context_delimiters`, this
   path leaves all user-controlled fields unsanitized, exposing 
prompt-injection payloads
   directly in structured MCP responses consumed by the LLM.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=9fd6f20a60cb4c28bd6e86e2639d49d1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=9fd6f20a60cb4c28bd6e86e2639d49d1&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/duplicate_dashboard.py
   **Line:** 90:117
   **Comment:**
        *Security: The tool returns dashboard, chart, and tag text fields 
without LLM-context sanitization, unlike the standard dashboard serializers. 
Since these fields are user-controlled, this can surface prompt-injection 
payloads directly in MCP structured responses. Sanitize the response payload 
(or reuse the existing sanitized dashboard serializer path) before returning it.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40959&comment_hash=6bd945206f2ca5c2c1f825ab8b996fc85c19fdf2751b14852a4fc99ec5410b81&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40959&comment_hash=6bd945206f2ca5c2c1f825ab8b996fc85c19fdf2751b14852a4fc99ec5410b81&reaction=dislike'>👎</a>



##########
superset/mcp_service/dashboard/tool/duplicate_dashboard.py:
##########
@@ -0,0 +1,277 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP tool: duplicate_dashboard
+
+Duplicates an existing dashboard, optionally deep-copying its charts.
+Canonical workflow: clone a template dashboard, then edit the copy
+(e.g. to create a regional or staging variant).
+"""
+
+import logging
+from typing import Any
+
+from fastmcp import Context
+from sqlalchemy.exc import SQLAlchemyError
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dashboard.schemas import (
+    DashboardInfo,
+    DuplicateDashboardRequest,
+    DuplicateDashboardResponse,
+    serialize_chart_summary,
+)
+from superset.mcp_service.privacy import user_can_view_data_model_metadata
+from superset.mcp_service.utils.url_utils import get_superset_base_url
+from superset.utils import json
+
+logger = logging.getLogger(__name__)
+
+
+def _build_copy_payload(
+    source: Any, dashboard_title: str, duplicate_slices: bool
+) -> dict[str, Any]:
+    """Build the data payload expected by ``CopyDashboardCommand``.
+
+    Mirrors what the frontend "Save as" flow sends to the
+    ``/api/v1/dashboard/<id>/copy/`` endpoint: the source dashboard's
+    current ``json_metadata`` with a ``positions`` key holding the current
+    layout (``position_json``). ``DashboardCopySchema`` requires
+    ``json_metadata``, and ``DashboardDAO.copy_dashboard`` reads
+    ``positions`` from it to remap chart IDs when ``duplicate_slices``
+    is enabled.
+    """
+    try:
+        metadata = json.loads(source.json_metadata or "{}")
+    except (json.JSONDecodeError, TypeError):
+        metadata = {}
+    if not isinstance(metadata, dict):
+        metadata = {}
+
+    try:
+        positions = json.loads(source.position_json or "{}")
+    except (json.JSONDecodeError, TypeError):
+        positions = {}
+    if not isinstance(positions, dict):
+        positions = {}
+
+    metadata["positions"] = positions

Review Comment:
   **Suggestion:** Falling back to an empty `positions` object when 
`position_json` is missing/invalid will silently drop all charts in the copied 
dashboard. `DashboardDAO.copy_dashboard`/`set_dash_metadata` rebuilds dashboard 
slices from `positions`, so `{}` produces an empty slice list even when the 
source dashboard had charts. Fail fast on invalid/missing layout or build a 
positions map from source charts before calling the copy command. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP duplicate_dashboard can create empty dashboards unexpectedly.
   - ⚠️ Source dashboards with bad layout copy without any charts.
   - ⚠️ LLM workflows see misleading empty dashboard copies.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create or identify a dashboard object with charts but invalid or missing 
layout JSON:
   in `superset/models/dashboard.py:131-147` the `Dashboard` model has 
`position_json` and
   `slices` fields; use a DB migration or shell to set `position_json` to 
`NULL` or a
   non-JSON string for a dashboard that still has non-empty `slices`.
   
   2. Call the MCP tool `duplicate_dashboard` defined in
   `superset/mcp_service/dashboard/tool/duplicate_dashboard.py:131-144`, 
passing that
   dashboard's ID/UUID/slug via `DuplicateDashboardRequest.dashboard_id`.
   
   3. Inside `duplicate_dashboard`, `_build_copy_payload` is invoked at
   `duplicate_dashboard.py:187-189`, which parses `source.position_json` at 
lines `67-72`;
   because `position_json` is `NULL`/invalid, the `except` block and type check 
coerce
   `positions` to `{}`, then `metadata["positions"] = positions` at line 74 
ensures the
   copied payload always contains an empty `positions` dict.
   
   4. `CopyDashboardCommand.run()` in 
`superset/commands/dashboard/copy.py:11-14` calls
   `DashboardDAO.copy_dashboard`, which in turn calls 
`DashboardDAO.set_dash_metadata(dash,
   metadata, old_to_new_slice_ids)` at `superset/daos/dashboard.py:43-44`; since
   `data.get("positions")` is `{}` (non-None), `set_dash_metadata` at 
`dashboard.py:22-36`
   computes `slice_ids = []`, sets `dashboard.slices = current_slices` where 
`current_slices`
   is `[]`, and writes the new dashboard with no charts even though the source 
had charts,
   causing the duplicated dashboard to lose all charts.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=cc3b61e9ddfd4a7eafef0b51bb09a01a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=cc3b61e9ddfd4a7eafef0b51bb09a01a&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/duplicate_dashboard.py
   **Line:** 67:74
   **Comment:**
        *Logic Error: Falling back to an empty `positions` object when 
`position_json` is missing/invalid will silently drop all charts in the copied 
dashboard. `DashboardDAO.copy_dashboard`/`set_dash_metadata` rebuilds dashboard 
slices from `positions`, so `{}` produces an empty slice list even when the 
source dashboard had charts. Fail fast on invalid/missing layout or build a 
positions map from source charts before calling the copy command.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40959&comment_hash=2ded76a9cd113cdf3e039b0e7096bc070770a62c5f0c5dead14985accb02430b&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40959&comment_hash=2ded76a9cd113cdf3e039b0e7096bc070770a62c5f0c5dead14985accb02430b&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to