codeant-ai-for-open-source[bot] commented on code in PR #40328:
URL: https://github.com/apache/superset/pull/40328#discussion_r3280730045
##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -974,3 +1171,58 @@ def serialize_dashboard_object(dashboard: Any) ->
DashboardInfo:
else [],
)
)
+
+
+def _sanitize_dashboard_layout_for_llm_context(
+ layout: DashboardLayout,
+) -> DashboardLayout:
+ """Wrap layout text fields before LLM exposure."""
+ payload = layout.model_dump(mode="python")
+ payload["dashboard_title"] = sanitize_for_llm_context(
+ payload.get("dashboard_title"),
+ field_path=("dashboard_title",),
+ )
+ payload["tabs"] = [
+ {
+ **tab,
+ "name": sanitize_for_llm_context(
+ tab.get("name"),
+ field_path=("tabs", str(index), "name"),
+ ),
+ }
+ for index, tab in enumerate(payload.get("tabs", []))
+ ]
+ payload["charts"] = [
+ {
+ **chart,
+ "slice_name": sanitize_for_llm_context(
+ chart.get("slice_name"),
+ field_path=("charts", str(index), "slice_name"),
+ ),
+ "tab_path": [
+ sanitize_for_llm_context(
+ name,
+ field_path=("charts", str(index), "tab_path",
str(part_index)),
+ )
+ for part_index, name in enumerate(chart.get("tab_path", []) or
[])
+ ],
+ }
+ for index, chart in enumerate(payload.get("charts", []))
+ ]
+ return DashboardLayout.model_validate(payload)
+
+
+def dashboard_layout_serializer(dashboard: "Dashboard") -> DashboardLayout:
+ """Serialize a Dashboard model to a parsed DashboardLayout."""
+ position_json_str = getattr(dashboard, "position_json", None)
+ tabs, charts = _extract_layout_from_position(position_json_str)
+ return _sanitize_dashboard_layout_for_llm_context(
+ DashboardLayout(
+ id=dashboard.id,
+ dashboard_title=dashboard.dashboard_title or "Untitled",
+ uuid=str(dashboard.uuid) if dashboard.uuid else None,
+ tabs=tabs,
+ charts=charts,
+ has_layout=bool(position_json_str),
Review Comment:
**Suggestion:** `has_layout` is derived from `bool(position_json_str)`, so
malformed or whitespace-only `position_json` values are reported as `true` even
when parsing returns no tabs/charts. This creates an inconsistent contract
(`has_layout=true` with empty layout data) and can mislead callers into
treating broken layouts as valid. Compute `has_layout` from successful parsed
layout state instead (for example, based on
`_parse_position_json`/`_extract_layout_from_position` results). [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP get_dashboard_layout may report layouts for invalid JSON.
- ⚠️ Agents misinterpret corrupted layouts as valid, confusing responses.
- ⚠️ Callers must infer validity from empty tabs/charts manually.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Note the layout extraction logic in
`superset/mcp_service/dashboard/schemas.py:_extract_layout_from_position`
(around lines
829–885 in the PR), which calls `_parse_position_json(position_json_str)`
and returns
`(tabs, charts)`; when `position_json_str` is invalid JSON or not a dict,
`_parse_position_json` returns `None` and `_extract_layout_from_position`
returns `tabs =
[]`, `charts = []`.
2. Observe the serializer `dashboard_layout_serializer` in
`superset/mcp_service/dashboard/schemas.py` (lines 1215–1228), which builds a
`DashboardLayout` with `tabs` and `charts` from
`_extract_layout_from_position(position_json_str)` but sets
`has_layout=bool(position_json_str)` (line 1226), based solely on the raw
string's
truthiness.
3. The unit test `test_extract_layout_handles_invalid_json` in
`superset/tests/unit_tests/mcp_service/dashboard/tool/test_get_dashboard_layout.py:87-90`
demonstrates that invalid `position_json` values like `"{ not json"` are
expected inputs:
`_extract_layout_from_position("{ not json")` returns `tabs == []` and
`charts == []`,
proving the code anticipates invalid JSON in the DB.
4. In a real run, if a dashboard row has `position_json` set to a non-empty
but invalid
JSON string (as in step 3), calling the MCP tool `get_dashboard_layout`
(registered in
`superset/mcp_service/app.py:609` and implemented in
`superset/mcp_service/dashboard/tool/get_dashboard_layout.py:55-92`) will
invoke
`dashboard_layout_serializer`; this will emit a `DashboardLayout` where
`tabs == []`,
`charts == []`, but `has_layout` is `True` (because
`bool(position_json_str)` is `True`
for any non-empty string), creating an inconsistent contract that reports a
layout as
present even though parsing produced no usable layout data.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=012ceeb50216409c96f7dbbba9592bef&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=012ceeb50216409c96f7dbbba9592bef&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/schemas.py
**Line:** 1226:1226
**Comment:**
*Logic Error: `has_layout` is derived from `bool(position_json_str)`,
so malformed or whitespace-only `position_json` values are reported as `true`
even when parsing returns no tabs/charts. This creates an inconsistent contract
(`has_layout=true` with empty layout data) and can mislead callers into
treating broken layouts as valid. Compute `has_layout` from successful parsed
layout state instead (for example, based on
`_parse_position_json`/`_extract_layout_from_position` results).
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40328&comment_hash=f8bf83dc1e1e2b61a40f6dc3605fed981020b4898e3ad020415748fa59bddfd1&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40328&comment_hash=f8bf83dc1e1e2b61a40f6dc3605fed981020b4898e3ad020415748fa59bddfd1&reaction=dislike'>👎</a>
##########
superset/mcp_service/dashboard/tool/get_dashboard_layout.py:
##########
@@ -0,0 +1,122 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Get dashboard layout FastMCP tool
+
+Companion to get_dashboard_info: returns the parsed dashboard layout
+(tabs and chart positions) extracted from position_json. Use this
+when get_dashboard_info's omitted_fields hint indicates position_json
+was stripped and structured layout data is needed for analysis.
+"""
+
+import logging
+from datetime import datetime, timezone
+
+from fastmcp import Context
+from sqlalchemy.orm import subqueryload
+from superset_core.mcp.decorators import tool, ToolAnnotations
+
+from superset.extensions import event_logger
+from superset.mcp_service.dashboard.schemas import (
+ dashboard_layout_serializer,
+ DashboardError,
+ DashboardLayout,
+ GetDashboardLayoutRequest,
+)
+from superset.mcp_service.mcp_core import ModelGetInfoCore
+
+logger = logging.getLogger(__name__)
+
+
+@tool(
+ tags=["discovery"],
+ class_permission_name="Dashboard",
+ annotations=ToolAnnotations(
+ title="Get dashboard layout",
+ readOnlyHint=True,
+ destructiveHint=False,
+ ),
+)
+async def get_dashboard_layout(
+ request: GetDashboardLayoutRequest, ctx: Context
+) -> DashboardLayout | DashboardError:
+ """
+ Get parsed dashboard layout by ID, UUID, or slug.
+
+ Returns the tabs and chart positions extracted from the dashboard's
+ position_json. get_dashboard_info omits position_json to keep responses
+ small; call this tool when you need the structured layout (e.g. to
+ explain which charts live under which tab, or to locate a chart by
+ its parent tab).
+
+ Example usage:
+ ```json
+ {
+ "identifier": 123
+ }
+ ```
+ """
+ await ctx.info("Retrieving dashboard layout: identifier=%s" %
(request.identifier,))
+
+ try:
+ from superset.daos.dashboard import DashboardDAO
+ from superset.models.dashboard import Dashboard
+
+ eager_options = [subqueryload(Dashboard.slices)]
Review Comment:
**Suggestion:** The tool eagerly loads `Dashboard.slices`, but the
serializer only reads `id`, `dashboard_title`, `uuid`, and `position_json`.
This adds an unnecessary extra relationship load for every request and can
significantly increase query cost and memory usage on large dashboards. Remove
this eager load for this tool path. [performance]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Each get_dashboard_layout call eagerly loads all dashboard slices.
- ⚠️ Large dashboards incur unnecessary extra SQL queries.
- ⚠️ Increased memory usage for unused `Dashboard.slices` collections.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. The MCP tool `get_dashboard_layout` is registered and exposed in
`superset/mcp_service/app.py` at line 609 (`get_dashboard_layout` is
included in the tools
tuple) and implemented in
`superset/mcp_service/dashboard/tool/get_dashboard_layout.py:55-92`; MCP
clients call this
tool to fetch parsed dashboard layout information.
2. In `get_dashboard_layout` (file
`superset/mcp_service/dashboard/tool/get_dashboard_layout.py`, lines 76-92),
the code
imports `DashboardDAO` and `Dashboard`, then defines `eager_options =
[subqueryload(Dashboard.slices)]` (line 80) and passes this list as
`query_options` when
constructing `ModelGetInfoCore` (lines 83-21).
3. `ModelGetInfoCore` in `superset/mcp_service/mcp_core.py` uses
`self.query_options` when
building the base query: `_base_filtered_query` (lines 51-69) calls
`query.options(*self.query_options)` if any query options are provided.
Passing
`subqueryload(Dashboard.slices)` forces SQLAlchemy to execute additional
queries and fully
load `Dashboard.slices` for each dashboard returned.
4. The serializer configured for this tool, `dashboard_layout_serializer` in
`superset/mcp_service/dashboard/schemas.py:36-49`, only accesses
`dashboard.id`,
`dashboard.dashboard_title`, `dashboard.uuid`, and
`dashboard.position_json`, and never
touches `dashboard.slices`; thus, every call to `get_dashboard_layout` pays
the cost of
eagerly loading all chart slices (extra SQL and memory) without using them,
which is
particularly wasteful for dashboards with many charts, as demonstrated by
the tool-level
tests in
`superset/tests/unit_tests/mcp_service/dashboard/tool/test_get_dashboard_layout.py`
that
exercise `get_dashboard_layout` as the main entrypoint.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=9e3583acd2b944ba91e963d144bbc374&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=9e3583acd2b944ba91e963d144bbc374&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/tool/get_dashboard_layout.py
**Line:** 80:80
**Comment:**
*Performance: The tool eagerly loads `Dashboard.slices`, but the
serializer only reads `id`, `dashboard_title`, `uuid`, and `position_json`.
This adds an unnecessary extra relationship load for every request and can
significantly increase query cost and memory usage on large dashboards. Remove
this eager load for this tool path.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40328&comment_hash=a21064661809464f91125028c88293e836112ac76a6d5d9c81a8297920ae7231&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40328&comment_hash=a21064661809464f91125028c88293e836112ac76a6d5d9c81a8297920ae7231&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]