codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2930742920
##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -142,9 +188,16 @@ def _serialize_chart(
)
# Apply field filtering via serialization context
- # Always use columns_requested (either explicit select_columns or
defaults)
- # This triggers ChartInfo._filter_fields_by_context for each chart
columns_to_filter = result.columns_requested
+ # Re-add popularity_score if it was originally requested
+ # (it was stripped before the DAO query since it's computed)
+ if (
+ request.select_columns
+ and "popularity_score" in request.select_columns
+ and columns_to_filter
+ and "popularity_score" not in columns_to_filter
+ ):
+ columns_to_filter = list(columns_to_filter) + ["popularity_score"]
Review Comment:
**Suggestion:** Internal column injection for score computation leaks into
the response shape. When a client requests only `popularity_score`, the code
adds `id` to `select_columns` for DAO loading and then reuses those mutated
columns for output filtering, so `id` is returned even though it was not
requested. Build response filtering from the original request columns instead
of DAO-expanded columns. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ list_charts response shape violates requested select_columns contract.
- ⚠️ Unrequested id field increases MCP response payload tokens.
- ⚠️ Client-side strict column parsers may mis-handle responses.
```
</details>
```suggestion
columns_to_filter = (
list(request.select_columns)
if request.select_columns
else result.columns_requested
)
# Re-add popularity_score if it was originally requested
# (it was stripped before the DAO query since it's computed)
if (
request.select_columns
and "popularity_score" in request.select_columns
and columns_to_filter
and "popularity_score" not in columns_to_filter
):
columns_to_filter = list(columns_to_filter) +
["popularity_score"]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. MCP registers `list_charts` as a callable tool via imports in
`superset/mcp_service/app.py:393-399`; registration is verified in
`tests/unit_tests/mcp_service/test_mcp_tool_registration.py:31`.
2. Call MCP `tools/call` with `name=\"list_charts\"` and params
`{\"select_columns\":
[\"popularity_score\"]}`; middleware resolves tool by message name at
`superset/mcp_service/middleware.py:256`.
3. In `superset/mcp_service/chart/tool/list_charts.py:155-164`, code removes
`popularity_score` for DAO, then injects `"id"` into `dao_columns` to
compute scores.
4. `ModelListCore.run_tool()` sets `columns_requested = select_columns` at
`superset/mcp_service/mcp_core.py:154-160`, so response metadata now carries
`["id"]`
(internal column) rather than original request.
5. Back in `list_charts`, filtering uses `result.columns_requested` at
`superset/mcp_service/chart/tool/list_charts.py:191`, then re-adds
`popularity_score` at
`:194-200`, resulting in `["id","popularity_score"]`.
6. Serializer enforces context-based field output in
`superset/mcp_service/chart/schemas.py:151-155`, so response includes leaked
`id` even
though caller requested only `popularity_score`.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/list_charts.py
**Line:** 191:200
**Comment:**
*Logic Error: Internal column injection for score computation leaks
into the response shape. When a client requests only `popularity_score`, the
code adds `id` to `select_columns` for DAO loading and then reuses those
mutated columns for output filtering, so `id` is returned even though it was
not requested. Build response filtering from the original request columns
instead of DAO-expanded columns.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7fb7a688a22b462515eeb65bb2133a9e12056269b72dfa413a13a777c89b340c&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7fb7a688a22b462515eeb65bb2133a9e12056269b72dfa413a13a777c89b340c&reaction=dislike'>👎</a>
##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -144,9 +179,16 @@ def _serialize_dashboard(
)
# Apply field filtering via serialization context
- # Always use columns_requested (either explicit select_columns or defaults)
- # This triggers DashboardInfo._filter_fields_by_context for each dashboard
columns_to_filter = result.columns_requested
+ # Re-add popularity_score if it was originally requested
+ # (it was stripped before the DAO query since it's computed)
+ if (
+ request.select_columns
+ and "popularity_score" in request.select_columns
+ and columns_to_filter
+ and "popularity_score" not in columns_to_filter
+ ):
+ columns_to_filter = list(columns_to_filter) + ["popularity_score"]
Review Comment:
**Suggestion:** Injecting `id` into `dao_columns` for score computation
leaks `id` into the response when clients request `popularity_score` without
`id`, because field filtering later uses `result.columns_requested` (which now
contains the injected column). Build `columns_to_filter` from the original
`request.select_columns` when provided so only explicitly requested fields are
returned. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ list_dashboards returns unrequested `id` field.
- ⚠️ select_columns contract breaks for dashboard item fields.
- ⚠️ MCP clients may mis-handle unexpected response shape.
```
</details>
```suggestion
columns_to_filter = (
list(request.select_columns)
if request.select_columns
else result.columns_requested
)
# Re-add popularity_score if it was originally requested
# (it was stripped before the DAO query since it's computed)
if (
request.select_columns
and "popularity_score" in request.select_columns
and columns_to_filter
and "popularity_score" not in columns_to_filter
):
columns_to_filter = list(columns_to_filter) + ["popularity_score"]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start MCP server where `list_dashboards` is registered via import in
`superset/mcp_service/app.py:23-28`.
2. Call tool `list_dashboards` with
`select_columns=["dashboard_title","popularity_score"]` and default sorting
(not
`order_column="popularity_score"`), entering standard path in
`superset/mcp_service/dashboard/tool/list_dashboards.py:133-166`.
3. In that path, code strips computed field then injects `id`
(`list_dashboards.py:149-155`) and passes injected list to
`ModelListCore.run_tool(...)`
(`list_dashboards.py:157-166`).
4. `ModelListCore` sets `columns_requested = select_columns` exactly as
passed
(`superset/mcp_service/mcp_core.py:154-160`), so response metadata now
includes injected
`id`.
5. Final serialization uses `columns_to_filter = result.columns_requested`
(`list_dashboards.py:182`) and dashboard field filtering by context
(`superset/mcp_service/dashboard/schemas.py:373-379`), causing each
dashboard object to
include `id` although client did not request it.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/tool/list_dashboards.py
**Line:** 182:191
**Comment:**
*Logic Error: Injecting `id` into `dao_columns` for score computation
leaks `id` into the response when clients request `popularity_score` without
`id`, because field filtering later uses `result.columns_requested` (which now
contains the injected column). Build `columns_to_filter` from the original
`request.select_columns` when provided so only explicitly requested fields are
returned.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=18965dae8a03634fe54c7be5f711f6bf666169c9e53993275402d8dc5c539c7a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=18965dae8a03634fe54c7be5f711f6bf666169c9e53993275402d8dc5c539c7a&reaction=dislike'>👎</a>
##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -152,9 +190,16 @@ def _serialize_dataset(
)
# Apply field filtering via serialization context
- # Always use columns_requested (either explicit select_columns or
defaults)
- # This triggers DatasetInfo._filter_fields_by_context for each dataset
columns_to_filter = result.columns_requested
+ # Re-add popularity_score if it was originally requested
+ # (it was stripped before the DAO query since it's computed)
+ if (
+ request.select_columns
+ and "popularity_score" in request.select_columns
+ and columns_to_filter
+ and "popularity_score" not in columns_to_filter
+ ):
+ columns_to_filter = list(columns_to_filter) + ["popularity_score"]
Review Comment:
**Suggestion:** Internal helper columns are leaking into API output. When
`select_columns=["popularity_score"]`, the code injects `id` for internal
scoring and later uses `result.columns_requested` for response filtering, so
`id` is returned even though it was not requested. Build `columns_to_filter`
from the original request columns instead of the DAO-loaded columns. [logic
error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ list_datasets returns unrequested `id` field.
- ⚠️ select_columns contract becomes inconsistent for clients.
```
</details>
```suggestion
columns_to_filter = (
list(request.select_columns)
if request.select_columns
else result.columns_requested
)
# Re-add popularity_score if it was originally requested
# (it was stripped before the DAO query since it's computed)
if (
request.select_columns
and "popularity_score" in request.select_columns
and columns_to_filter
and "popularity_score" not in columns_to_filter
):
columns_to_filter = list(columns_to_filter) +
["popularity_score"]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call MCP tool `list_datasets` through FastMCP client (same path used in
tests at
`tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:186-190`)
with
`select_columns=["popularity_score"]` and default ordering.
2. Request reaches `list_datasets()` at
`superset/mcp_service/dataset/tool/list_datasets.py:77`; non-popularity
branch runs, and
`dao_columns` injects `"id"` at `:154-164` when popularity is requested.
3. `ModelListCore.run_tool()` stores injected columns as `columns_requested`
(`superset/mcp_service/mcp_core.py:154-160,210`), so response metadata now
thinks `"id"`
was requested.
4. Serialization filter uses `result.columns_requested` at
`list_datasets.py:193`, then
adds popularity at `:196-202`; dataset serializer
(`superset/mcp_service/dataset/schemas.py:164-173`) returns both `id` and
`popularity_score`, leaking unrequested `id`.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/list_datasets.py
**Line:** 193:202
**Comment:**
*Logic Error: Internal helper columns are leaking into API output. When
`select_columns=["popularity_score"]`, the code injects `id` for internal
scoring and later uses `result.columns_requested` for response filtering, so
`id` is returned even though it was not requested. Build `columns_to_filter`
from the original request columns instead of the DAO-loaded columns.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=b09680e2253c0db28b8efc5e12b3462298446bb7611fbdaa734f2137c81bc308&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=b09680e2253c0db28b8efc5e12b3462298446bb7611fbdaa734f2137c81bc308&reaction=dislike'>👎</a>
##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -176,3 +221,78 @@ def _serialize_dataset(
)
)
raise
+
+
+def _list_datasets_by_popularity(
+ request: ListDatasetsRequest,
+ dao_class: Any,
+ serializer: Callable[..., dict[str, Any] | None],
+ all_columns: list[str],
+ ctx: Context,
+) -> DatasetList:
+ """Two-pass listing: sort all matching datasets by popularity score."""
+ from superset.mcp_service.common.schema_discovery import
DATASET_SORTABLE_COLUMNS
+
+ sorted_ids, scores, total_count = get_popularity_sorted_ids(
+ compute_fn=compute_dataset_popularity,
+ dao_class=dao_class,
+ filters=request.filters,
+ search=request.search,
+ search_columns=DATASET_SEARCH_COLUMNS,
+ order_direction=request.order_direction,
+ )
+
+ # Apply pagination to sorted IDs
+ page = max(request.page - 1, 0)
+ page_size = request.page_size
+ start = page * page_size
+ end = start + page_size
+
+ # Fetch full models for page IDs
+ if page_ids := sorted_ids[start:end]:
+ items = dao_class.find_by_ids(page_ids)
+ id_to_item = {item.id: item for item in items}
+ ordered_items = [id_to_item[pid] for pid in page_ids if pid in
id_to_item]
+ else:
+ ordered_items = []
+
+ # Serialize
+ select_columns = request.select_columns or DEFAULT_DATASET_COLUMNS
+ if "popularity_score" not in select_columns:
+ select_columns = list(select_columns) + ["popularity_score"]
+
+ ds_objs = []
+ for item in ordered_items:
+ obj = serializer(item, select_columns)
+ if obj is not None:
+ ds_objs.append(obj)
+
+ attach_popularity_scores(ds_objs, scores)
+
+ total_pages = (total_count + page_size - 1) // page_size if page_size > 0
else 0
+ pagination_info = PaginationInfo(
+ page=page,
+ page_size=page_size,
+ total_count=total_count,
+ total_pages=total_pages,
+ has_next=page < total_pages - 1,
+ has_previous=page > 0,
+ )
+
+ return DatasetList(
+ datasets=ds_objs,
+ count=len(ds_objs),
+ total_count=total_count,
+ page=page,
+ page_size=page_size,
+ total_pages=total_pages,
+ has_previous=page > 0,
+ has_next=page < total_pages - 1,
+ columns_requested=select_columns,
+ columns_loaded=select_columns,
+ columns_available=all_columns,
+ sortable_columns=DATASET_SORTABLE_COLUMNS,
+ filters_applied=request.filters if isinstance(request.filters, list)
else [],
+ pagination=pagination_info,
+ timestamp=datetime.now(timezone.utc),
+ )
Review Comment:
**Suggestion:** Sorting by popularity currently forces `popularity_score`
into `columns_requested` even when the client did not request it, which changes
the default response shape and breaks `select_columns` semantics. Keep
`popularity_score` only in loaded/internal columns and preserve requested
columns separately. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Popularity sort changes default dataset response shape.
- ⚠️ `columns_requested` no longer reflects client input.
```
</details>
```suggestion
# Serialize
requested_columns = request.select_columns or DEFAULT_DATASET_COLUMNS
loaded_columns = list(requested_columns)
if "popularity_score" not in loaded_columns:
loaded_columns.append("popularity_score")
ds_objs = []
for item in ordered_items:
obj = serializer(item, loaded_columns)
if obj is not None:
ds_objs.append(obj)
attach_popularity_scores(ds_objs, scores)
total_pages = (total_count + page_size - 1) // page_size if page_size >
0 else 0
pagination_info = PaginationInfo(
page=page,
page_size=page_size,
total_count=total_count,
total_pages=total_pages,
has_next=page < total_pages - 1,
has_previous=page > 0,
)
return DatasetList(
datasets=ds_objs,
count=len(ds_objs),
total_count=total_count,
page=page,
page_size=page_size,
total_pages=total_pages,
has_previous=page > 0,
has_next=page < total_pages - 1,
columns_requested=requested_columns,
columns_loaded=loaded_columns,
columns_available=all_columns,
sortable_columns=DATASET_SORTABLE_COLUMNS,
filters_applied=request.filters if isinstance(request.filters, list)
else [],
pagination=pagination_info,
timestamp=datetime.now(timezone.utc),
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Invoke `list_datasets` tool via MCP (tool is registered through
`superset/mcp_service/app.py:29-32,93-95`) with
`order_column="popularity_score"` and no
`select_columns`.
2. `list_datasets()` routes into `_list_datasets_by_popularity()` at
`superset/mcp_service/dataset/tool/list_datasets.py:133-137`.
3. In popularity path, code forces `"popularity_score"` into
`select_columns` at
`list_datasets.py:260-263` and sets both `columns_requested` and
`columns_loaded` to this
augmented list at `:291-292`.
4. Outer serializer uses `result.columns_requested`
(`list_datasets.py:193,208-210`), and
dataset field filtering (`superset/mcp_service/dataset/schemas.py:164-173`)
includes
popularity_score in item output even though client did not provide
`select_columns`.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/list_datasets.py
**Line:** 260:298
**Comment:**
*Logic Error: Sorting by popularity currently forces `popularity_score`
into `columns_requested` even when the client did not request it, which changes
the default response shape and breaks `select_columns` semantics. Keep
`popularity_score` only in loaded/internal columns and preserve requested
columns separately.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=58e32f81c83c0db0a83026134b66dac258e4790a4b044d0d0602713ede0c8a01&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=58e32f81c83c0db0a83026134b66dac258e4790a4b044d0d0602713ede0c8a01&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]