codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2910653665


##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -115,32 +121,45 @@ def _serialize_dataset(
             """Serialize dataset (filtering via model_serializer)."""
             return serialize_dataset_object(obj)
 
-        # Create tool with standard serialization
-        tool = ModelListCore(
-            dao_class=DatasetDAO,
-            output_schema=DatasetInfo,
-            item_serializer=_serialize_dataset,
-            filter_type=DatasetFilter,
-            default_columns=DEFAULT_DATASET_COLUMNS,
-            search_columns=["schema", "sql", "table_name", "uuid"],
-            list_field_name="datasets",
-            output_list_schema=DatasetList,
-            all_columns=all_columns,
-            sortable_columns=DATASET_SORTABLE_COLUMNS,
-            logger=logger,
-        )
-
-        with event_logger.log_context(action="mcp.list_datasets.query"):
-            result = tool.run_tool(
-                filters=request.filters,
-                search=request.search,
-                select_columns=request.select_columns,
-                order_column=request.order_column,
-                order_direction=request.order_direction,
-                page=max(request.page - 1, 0),
-                page_size=request.page_size,
+        # Two-pass approach when sorting by popularity_score
+        if request.order_column == "popularity_score":
+            with 
event_logger.log_context(action="mcp.list_datasets.popularity_sort"):
+                result = _list_datasets_by_popularity(
+                    request, DatasetDAO, _serialize_dataset, all_columns, ctx
+                )
+        else:
+            # Create tool with standard serialization
+            list_core = ModelListCore(
+                dao_class=DatasetDAO,
+                output_schema=DatasetInfo,
+                item_serializer=_serialize_dataset,
+                filter_type=DatasetFilter,
+                default_columns=DEFAULT_DATASET_COLUMNS,
+                search_columns=DATASET_SEARCH_COLUMNS,
+                list_field_name="datasets",
+                output_list_schema=DatasetList,
+                all_columns=all_columns,
+                sortable_columns=DATASET_SORTABLE_COLUMNS,
+                logger=logger,
             )
 
+            with event_logger.log_context(action="mcp.list_datasets.query"):
+                result = list_core.run_tool(
+                    filters=request.filters,
+                    search=request.search,
+                    select_columns=request.select_columns,
+                    order_column=request.order_column,
+                    order_direction=request.order_direction,
+                    page=max(request.page - 1, 0),
+                    page_size=request.page_size,
+                )
+
+            # Attach popularity scores if requested in select_columns
+            if request.select_columns and "popularity_score" in 
request.select_columns:
+                if ds_ids := [d.id for d in result.datasets if d.id is not 
None]:
+                    scores = compute_dataset_popularity(ds_ids)
+                    attach_popularity_scores(result.datasets, scores)

Review Comment:
   **Suggestion:** The dataset list tool passes `select_columns` directly 
(including `popularity_score`) into `ModelListCore.run_tool` and thus to 
`DatasetDAO.list(columns=...)`, but `popularity_score` is computed rather than 
stored, so selecting it will cause the DAO query to fail instead of letting the 
later scoring code augment the results. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ list_datasets fails when selecting popularity_score column.
   - ⚠️ MCP clients cannot retrieve dataset popularity metadata.
   - ⚠️ New popularity_score feature unusable without workaround for datasets.
   ```
   </details>
   
   ```suggestion
               else:
                   # Create tool with standard serialization
                   list_core = ModelListCore(
                       dao_class=DatasetDAO,
                       output_schema=DatasetInfo,
                       item_serializer=_serialize_dataset,
                       filter_type=DatasetFilter,
                       default_columns=DEFAULT_DATASET_COLUMNS,
                       search_columns=DATASET_SEARCH_COLUMNS,
                       list_field_name="datasets",
                       output_list_schema=DatasetList,
                       all_columns=all_columns,
                       sortable_columns=DATASET_SORTABLE_COLUMNS,
                       logger=logger,
                   )
   
                   # popularity_score is computed, so do not request it as a 
physical column
                   original_select_columns = list(request.select_columns or [])
                   dao_select_columns = [
                       col for col in original_select_columns if col != 
"popularity_score"
                   ] or None
   
                   with 
event_logger.log_context(action="mcp.list_datasets.query"):
                       result = list_core.run_tool(
                           filters=request.filters,
                           search=request.search,
                           select_columns=dao_select_columns,
                           order_column=request.order_column,
                           order_direction=request.order_direction,
                           page=max(request.page - 1, 0),
                           page_size=request.page_size,
                       )
   
                   # Keep columns_requested aligned with what the client asked 
for
                   if original_select_columns:
                       result.columns_requested = original_select_columns
   
                   # Attach popularity scores if requested in select_columns
                   if "popularity_score" in original_select_columns:
                       if ds_ids := [d.id for d in result.datasets if d.id is 
not None]:
                           scores = compute_dataset_popularity(ds_ids)
                           attach_popularity_scores(result.datasets, scores)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start the MCP server (Superset MCP service) so tools from 
`superset.mcp_service.app`
   are available, including the `list_datasets` tool which is implemented in
   `superset/mcp_service/dataset/tool/list_datasets.py:70-183`.
   
   2. From an MCP client (mirroring the unit tests in
   
`superset/tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:1121-1131`),
   construct a `ListDatasetsRequest` (schema in
   `superset/mcp_service/dataset/schemas.py:212-255`) with 
`select_columns=["id",
   "table_name", "schema", "uuid", "popularity_score"]`, leaving 
`order_column=None` (so
   popularity sorting is NOT requested), and call the `list_datasets` tool via
   `Client.call_tool("list_datasets", {"request": request.model_dump()})`.
   
   3. The `list_datasets` function in
   `superset/mcp_service/dataset/tool/list_datasets.py:105-161` executes: 
because
   `request.order_column` is not `"popularity_score"`, it goes into the `else:` 
branch at
   line 131, constructs `ModelListCore`, and calls `list_core.run_tool(...,
   select_columns=request.select_columns, ...)` at lines 146-155 with 
`select_columns` still
   containing `"popularity_score"`.
   
   4. Inside `ModelListCore.run_tool` in 
`superset/mcp_service/mcp_core.py:135-174`,
   `select_columns` is parsed and used as `columns_to_load`, and the method 
calls
   `DatasetDAO.list(..., columns=columns_to_load)` at lines 165-174; the 
injected
   `DatasetDAO` implementation (from `superset/daos/dataset.py:46` via
   `superset.daos.base.BaseDAO`) attempts to build a SQLAlchemy query selecting 
a column
   named `popularity_score` on the `SqlaTable` model, but `popularity_score` is 
not a real
   ORM column (it is only defined as a Pydantic field on `DatasetInfo` in
   `superset/mcp_service/dataset/schemas.py:93-145`), causing the DAO/query 
layer to raise an
   exception (e.g., invalid column/attribute), which propagates back through 
`list_datasets`
   and results in a tool error instead of a successful dataset list response.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/list_datasets.py
   **Line:** 131:161
   **Comment:**
        *Logic Error: The dataset list tool passes `select_columns` directly 
(including `popularity_score`) into `ModelListCore.run_tool` and thus to 
`DatasetDAO.list(columns=...)`, but `popularity_score` is computed rather than 
stored, so selecting it will cause the DAO query to fail instead of letting the 
later scoring code augment the results.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7142293abb5cf7b4079590f56637e15d10c483c55e839ff2a48833c280c1515a&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7142293abb5cf7b4079590f56637e15d10c483c55e839ff2a48833c280c1515a&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to