codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2910653665
##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -115,32 +121,45 @@ def _serialize_dataset(
"""Serialize dataset (filtering via model_serializer)."""
return serialize_dataset_object(obj)
- # Create tool with standard serialization
- tool = ModelListCore(
- dao_class=DatasetDAO,
- output_schema=DatasetInfo,
- item_serializer=_serialize_dataset,
- filter_type=DatasetFilter,
- default_columns=DEFAULT_DATASET_COLUMNS,
- search_columns=["schema", "sql", "table_name", "uuid"],
- list_field_name="datasets",
- output_list_schema=DatasetList,
- all_columns=all_columns,
- sortable_columns=DATASET_SORTABLE_COLUMNS,
- logger=logger,
- )
-
- with event_logger.log_context(action="mcp.list_datasets.query"):
- result = tool.run_tool(
- filters=request.filters,
- search=request.search,
- select_columns=request.select_columns,
- order_column=request.order_column,
- order_direction=request.order_direction,
- page=max(request.page - 1, 0),
- page_size=request.page_size,
+ # Two-pass approach when sorting by popularity_score
+ if request.order_column == "popularity_score":
+ with
event_logger.log_context(action="mcp.list_datasets.popularity_sort"):
+ result = _list_datasets_by_popularity(
+ request, DatasetDAO, _serialize_dataset, all_columns, ctx
+ )
+ else:
+ # Create tool with standard serialization
+ list_core = ModelListCore(
+ dao_class=DatasetDAO,
+ output_schema=DatasetInfo,
+ item_serializer=_serialize_dataset,
+ filter_type=DatasetFilter,
+ default_columns=DEFAULT_DATASET_COLUMNS,
+ search_columns=DATASET_SEARCH_COLUMNS,
+ list_field_name="datasets",
+ output_list_schema=DatasetList,
+ all_columns=all_columns,
+ sortable_columns=DATASET_SORTABLE_COLUMNS,
+ logger=logger,
)
+ with event_logger.log_context(action="mcp.list_datasets.query"):
+ result = list_core.run_tool(
+ filters=request.filters,
+ search=request.search,
+ select_columns=request.select_columns,
+ order_column=request.order_column,
+ order_direction=request.order_direction,
+ page=max(request.page - 1, 0),
+ page_size=request.page_size,
+ )
+
+ # Attach popularity scores if requested in select_columns
+ if request.select_columns and "popularity_score" in
request.select_columns:
+ if ds_ids := [d.id for d in result.datasets if d.id is not
None]:
+ scores = compute_dataset_popularity(ds_ids)
+ attach_popularity_scores(result.datasets, scores)
Review Comment:
**Suggestion:** The dataset list tool passes `select_columns` directly
(including `popularity_score`) into `ModelListCore.run_tool` and thus to
`DatasetDAO.list(columns=...)`, but `popularity_score` is computed rather than
stored, so selecting it will cause the DAO query to fail instead of letting the
later scoring code augment the results. [logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ list_datasets fails when selecting popularity_score column.
- ⚠️ MCP clients cannot retrieve dataset popularity metadata.
- ⚠️ New popularity_score feature unusable without workaround for datasets.
```
</details>
```suggestion
else:
# Create tool with standard serialization
list_core = ModelListCore(
dao_class=DatasetDAO,
output_schema=DatasetInfo,
item_serializer=_serialize_dataset,
filter_type=DatasetFilter,
default_columns=DEFAULT_DATASET_COLUMNS,
search_columns=DATASET_SEARCH_COLUMNS,
list_field_name="datasets",
output_list_schema=DatasetList,
all_columns=all_columns,
sortable_columns=DATASET_SORTABLE_COLUMNS,
logger=logger,
)
# popularity_score is computed, so do not request it as a
physical column
original_select_columns = list(request.select_columns or [])
dao_select_columns = [
col for col in original_select_columns if col !=
"popularity_score"
] or None
with
event_logger.log_context(action="mcp.list_datasets.query"):
result = list_core.run_tool(
filters=request.filters,
search=request.search,
select_columns=dao_select_columns,
order_column=request.order_column,
order_direction=request.order_direction,
page=max(request.page - 1, 0),
page_size=request.page_size,
)
# Keep columns_requested aligned with what the client asked
for
if original_select_columns:
result.columns_requested = original_select_columns
# Attach popularity scores if requested in select_columns
if "popularity_score" in original_select_columns:
if ds_ids := [d.id for d in result.datasets if d.id is
not None]:
scores = compute_dataset_popularity(ds_ids)
attach_popularity_scores(result.datasets, scores)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start the MCP server (Superset MCP service) so tools from
`superset.mcp_service.app`
are available, including the `list_datasets` tool which is implemented in
`superset/mcp_service/dataset/tool/list_datasets.py:70-183`.
2. From an MCP client (mirroring the unit tests in
`superset/tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:1121-1131`),
construct a `ListDatasetsRequest` (schema in
`superset/mcp_service/dataset/schemas.py:212-255`) with
`select_columns=["id",
"table_name", "schema", "uuid", "popularity_score"]`, leaving
`order_column=None` (so
popularity sorting is NOT requested), and call the `list_datasets` tool via
`Client.call_tool("list_datasets", {"request": request.model_dump()})`.
3. The `list_datasets` function in
`superset/mcp_service/dataset/tool/list_datasets.py:105-161` executes:
because
`request.order_column` is not `"popularity_score"`, it goes into the `else:`
branch at
line 131, constructs `ModelListCore`, and calls `list_core.run_tool(...,
select_columns=request.select_columns, ...)` at lines 146-155 with
`select_columns` still
containing `"popularity_score"`.
4. Inside `ModelListCore.run_tool` in
`superset/mcp_service/mcp_core.py:135-174`,
`select_columns` is parsed and used as `columns_to_load`, and the method
calls
`DatasetDAO.list(..., columns=columns_to_load)` at lines 165-174; the
injected
`DatasetDAO` implementation (from `superset/daos/dataset.py:46` via
`superset.daos.base.BaseDAO`) attempts to build a SQLAlchemy query selecting
a column
named `popularity_score` on the `SqlaTable` model, but `popularity_score` is
not a real
ORM column (it is only defined as a Pydantic field on `DatasetInfo` in
`superset/mcp_service/dataset/schemas.py:93-145`), causing the DAO/query
layer to raise an
exception (e.g., invalid column/attribute), which propagates back through
`list_datasets`
and results in a tool error instead of a successful dataset list response.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/list_datasets.py
**Line:** 131:161
**Comment:**
*Logic Error: The dataset list tool passes `select_columns` directly
(including `popularity_score`) into `ModelListCore.run_tool` and thus to
`DatasetDAO.list(columns=...)`, but `popularity_score` is computed rather than
stored, so selecting it will cause the DAO query to fail instead of letting the
later scoring code augment the results.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7142293abb5cf7b4079590f56637e15d10c483c55e839ff2a48833c280c1515a&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7142293abb5cf7b4079590f56637e15d10c483c55e839ff2a48833c280c1515a&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]