codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2930742920


##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -142,9 +188,16 @@ def _serialize_chart(
         )
 
         # Apply field filtering via serialization context
-        # Always use columns_requested (either explicit select_columns or 
defaults)
-        # This triggers ChartInfo._filter_fields_by_context for each chart
         columns_to_filter = result.columns_requested
+        # Re-add popularity_score if it was originally requested
+        # (it was stripped before the DAO query since it's computed)
+        if (
+            request.select_columns
+            and "popularity_score" in request.select_columns
+            and columns_to_filter
+            and "popularity_score" not in columns_to_filter
+        ):
+            columns_to_filter = list(columns_to_filter) + ["popularity_score"]

Review Comment:
   **Suggestion:** Internal column injection for score computation leaks into 
the response shape. When a client requests only `popularity_score`, the code 
adds `id` to `select_columns` for DAO loading and then reuses those mutated 
columns for output filtering, so `id` is returned even though it was not 
requested. Build response filtering from the original request columns instead 
of DAO-expanded columns. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ list_charts response shape violates requested select_columns contract.
   - ⚠️ Unrequested id field increases MCP response payload tokens.
   - ⚠️ Client-side strict column parsers may mis-handle responses.
   ```
   </details>
   
   ```suggestion
           columns_to_filter = (
               list(request.select_columns)
               if request.select_columns
               else result.columns_requested
           )
           # Re-add popularity_score if it was originally requested
           # (it was stripped before the DAO query since it's computed)
           if (
               request.select_columns
               and "popularity_score" in request.select_columns
               and columns_to_filter
               and "popularity_score" not in columns_to_filter
           ):
               columns_to_filter = list(columns_to_filter) + 
["popularity_score"]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. MCP registers `list_charts` as a callable tool via imports in
   `superset/mcp_service/app.py:393-399`; registration is verified in
   `tests/unit_tests/mcp_service/test_mcp_tool_registration.py:31`.
   
   2. Call MCP `tools/call` with `name=\"list_charts\"` and params 
`{\"select_columns\":
   [\"popularity_score\"]}`; middleware resolves tool by message name at
   `superset/mcp_service/middleware.py:256`.
   
   3. In `superset/mcp_service/chart/tool/list_charts.py:155-164`, code removes
   `popularity_score` for DAO, then injects `"id"` into `dao_columns` to 
compute scores.
   
   4. `ModelListCore.run_tool()` sets `columns_requested = select_columns` at
   `superset/mcp_service/mcp_core.py:154-160`, so response metadata now carries 
`["id"]`
   (internal column) rather than original request.
   
   5. Back in `list_charts`, filtering uses `result.columns_requested` at
   `superset/mcp_service/chart/tool/list_charts.py:191`, then re-adds 
`popularity_score` at
   `:194-200`, resulting in `["id","popularity_score"]`.
   
   6. Serializer enforces context-based field output in
   `superset/mcp_service/chart/schemas.py:151-155`, so response includes leaked 
`id` even
   though caller requested only `popularity_score`.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/list_charts.py
   **Line:** 191:200
   **Comment:**
        *Logic Error: Internal column injection for score computation leaks 
into the response shape. When a client requests only `popularity_score`, the 
code adds `id` to `select_columns` for DAO loading and then reuses those 
mutated columns for output filtering, so `id` is returned even though it was 
not requested. Build response filtering from the original request columns 
instead of DAO-expanded columns.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7fb7a688a22b462515eeb65bb2133a9e12056269b72dfa413a13a777c89b340c&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=7fb7a688a22b462515eeb65bb2133a9e12056269b72dfa413a13a777c89b340c&reaction=dislike'>👎</a>



##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -144,9 +179,16 @@ def _serialize_dashboard(
     )
 
     # Apply field filtering via serialization context
-    # Always use columns_requested (either explicit select_columns or defaults)
-    # This triggers DashboardInfo._filter_fields_by_context for each dashboard
     columns_to_filter = result.columns_requested
+    # Re-add popularity_score if it was originally requested
+    # (it was stripped before the DAO query since it's computed)
+    if (
+        request.select_columns
+        and "popularity_score" in request.select_columns
+        and columns_to_filter
+        and "popularity_score" not in columns_to_filter
+    ):
+        columns_to_filter = list(columns_to_filter) + ["popularity_score"]

Review Comment:
   **Suggestion:** Injecting `id` into `dao_columns` for score computation 
leaks `id` into the response when clients request `popularity_score` without 
`id`, because field filtering later uses `result.columns_requested` (which now 
contains the injected column). Build `columns_to_filter` from the original 
`request.select_columns` when provided so only explicitly requested fields are 
returned. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ list_dashboards returns unrequested `id` field.
   - ⚠️ select_columns contract breaks for dashboard item fields.
   - ⚠️ MCP clients may mis-handle unexpected response shape.
   ```
   </details>
   
   ```suggestion
       columns_to_filter = (
           list(request.select_columns)
           if request.select_columns
           else result.columns_requested
       )
       # Re-add popularity_score if it was originally requested
       # (it was stripped before the DAO query since it's computed)
       if (
           request.select_columns
           and "popularity_score" in request.select_columns
           and columns_to_filter
           and "popularity_score" not in columns_to_filter
       ):
           columns_to_filter = list(columns_to_filter) + ["popularity_score"]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start MCP server where `list_dashboards` is registered via import in
   `superset/mcp_service/app.py:23-28`.
   
   2. Call tool `list_dashboards` with
   `select_columns=["dashboard_title","popularity_score"]` and default sorting 
(not
   `order_column="popularity_score"`), entering standard path in
   `superset/mcp_service/dashboard/tool/list_dashboards.py:133-166`.
   
   3. In that path, code strips computed field then injects `id`
   (`list_dashboards.py:149-155`) and passes injected list to 
`ModelListCore.run_tool(...)`
   (`list_dashboards.py:157-166`).
   
   4. `ModelListCore` sets `columns_requested = select_columns` exactly as 
passed
   (`superset/mcp_service/mcp_core.py:154-160`), so response metadata now 
includes injected
   `id`.
   
   5. Final serialization uses `columns_to_filter = result.columns_requested`
   (`list_dashboards.py:182`) and dashboard field filtering by context
   (`superset/mcp_service/dashboard/schemas.py:373-379`), causing each 
dashboard object to
   include `id` although client did not request it.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/list_dashboards.py
   **Line:** 182:191
   **Comment:**
        *Logic Error: Injecting `id` into `dao_columns` for score computation 
leaks `id` into the response when clients request `popularity_score` without 
`id`, because field filtering later uses `result.columns_requested` (which now 
contains the injected column). Build `columns_to_filter` from the original 
`request.select_columns` when provided so only explicitly requested fields are 
returned.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=18965dae8a03634fe54c7be5f711f6bf666169c9e53993275402d8dc5c539c7a&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=18965dae8a03634fe54c7be5f711f6bf666169c9e53993275402d8dc5c539c7a&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -152,9 +190,16 @@ def _serialize_dataset(
         )
 
         # Apply field filtering via serialization context
-        # Always use columns_requested (either explicit select_columns or 
defaults)
-        # This triggers DatasetInfo._filter_fields_by_context for each dataset
         columns_to_filter = result.columns_requested
+        # Re-add popularity_score if it was originally requested
+        # (it was stripped before the DAO query since it's computed)
+        if (
+            request.select_columns
+            and "popularity_score" in request.select_columns
+            and columns_to_filter
+            and "popularity_score" not in columns_to_filter
+        ):
+            columns_to_filter = list(columns_to_filter) + ["popularity_score"]

Review Comment:
   **Suggestion:** Internal helper columns are leaking into API output. When 
`select_columns=["popularity_score"]`, the code injects `id` for internal 
scoring and later uses `result.columns_requested` for response filtering, so 
`id` is returned even though it was not requested. Build `columns_to_filter` 
from the original request columns instead of the DAO-loaded columns. [logic 
error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ list_datasets returns unrequested `id` field.
   - ⚠️ select_columns contract becomes inconsistent for clients.
   ```
   </details>
   
   ```suggestion
           columns_to_filter = (
               list(request.select_columns)
               if request.select_columns
               else result.columns_requested
           )
           # Re-add popularity_score if it was originally requested
           # (it was stripped before the DAO query since it's computed)
           if (
               request.select_columns
               and "popularity_score" in request.select_columns
               and columns_to_filter
               and "popularity_score" not in columns_to_filter
           ):
               columns_to_filter = list(columns_to_filter) + 
["popularity_score"]
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP tool `list_datasets` through FastMCP client (same path used in 
tests at
   `tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:186-190`) 
with
   `select_columns=["popularity_score"]` and default ordering.
   
   2. Request reaches `list_datasets()` at
   `superset/mcp_service/dataset/tool/list_datasets.py:77`; non-popularity 
branch runs, and
   `dao_columns` injects `"id"` at `:154-164` when popularity is requested.
   
   3. `ModelListCore.run_tool()` stores injected columns as `columns_requested`
   (`superset/mcp_service/mcp_core.py:154-160,210`), so response metadata now 
thinks `"id"`
   was requested.
   
   4. Serialization filter uses `result.columns_requested` at 
`list_datasets.py:193`, then
   adds popularity at `:196-202`; dataset serializer
   (`superset/mcp_service/dataset/schemas.py:164-173`) returns both `id` and
   `popularity_score`, leaking unrequested `id`.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/list_datasets.py
   **Line:** 193:202
   **Comment:**
        *Logic Error: Internal helper columns are leaking into API output. When 
`select_columns=["popularity_score"]`, the code injects `id` for internal 
scoring and later uses `result.columns_requested` for response filtering, so 
`id` is returned even though it was not requested. Build `columns_to_filter` 
from the original request columns instead of the DAO-loaded columns.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=b09680e2253c0db28b8efc5e12b3462298446bb7611fbdaa734f2137c81bc308&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=b09680e2253c0db28b8efc5e12b3462298446bb7611fbdaa734f2137c81bc308&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -176,3 +221,78 @@ def _serialize_dataset(
             )
         )
         raise
+
+
+def _list_datasets_by_popularity(
+    request: ListDatasetsRequest,
+    dao_class: Any,
+    serializer: Callable[..., dict[str, Any] | None],
+    all_columns: list[str],
+    ctx: Context,
+) -> DatasetList:
+    """Two-pass listing: sort all matching datasets by popularity score."""
+    from superset.mcp_service.common.schema_discovery import 
DATASET_SORTABLE_COLUMNS
+
+    sorted_ids, scores, total_count = get_popularity_sorted_ids(
+        compute_fn=compute_dataset_popularity,
+        dao_class=dao_class,
+        filters=request.filters,
+        search=request.search,
+        search_columns=DATASET_SEARCH_COLUMNS,
+        order_direction=request.order_direction,
+    )
+
+    # Apply pagination to sorted IDs
+    page = max(request.page - 1, 0)
+    page_size = request.page_size
+    start = page * page_size
+    end = start + page_size
+
+    # Fetch full models for page IDs
+    if page_ids := sorted_ids[start:end]:
+        items = dao_class.find_by_ids(page_ids)
+        id_to_item = {item.id: item for item in items}
+        ordered_items = [id_to_item[pid] for pid in page_ids if pid in 
id_to_item]
+    else:
+        ordered_items = []
+
+    # Serialize
+    select_columns = request.select_columns or DEFAULT_DATASET_COLUMNS
+    if "popularity_score" not in select_columns:
+        select_columns = list(select_columns) + ["popularity_score"]
+
+    ds_objs = []
+    for item in ordered_items:
+        obj = serializer(item, select_columns)
+        if obj is not None:
+            ds_objs.append(obj)
+
+    attach_popularity_scores(ds_objs, scores)
+
+    total_pages = (total_count + page_size - 1) // page_size if page_size > 0 
else 0
+    pagination_info = PaginationInfo(
+        page=page,
+        page_size=page_size,
+        total_count=total_count,
+        total_pages=total_pages,
+        has_next=page < total_pages - 1,
+        has_previous=page > 0,
+    )
+
+    return DatasetList(
+        datasets=ds_objs,
+        count=len(ds_objs),
+        total_count=total_count,
+        page=page,
+        page_size=page_size,
+        total_pages=total_pages,
+        has_previous=page > 0,
+        has_next=page < total_pages - 1,
+        columns_requested=select_columns,
+        columns_loaded=select_columns,
+        columns_available=all_columns,
+        sortable_columns=DATASET_SORTABLE_COLUMNS,
+        filters_applied=request.filters if isinstance(request.filters, list) 
else [],
+        pagination=pagination_info,
+        timestamp=datetime.now(timezone.utc),
+    )

Review Comment:
   **Suggestion:** Sorting by popularity currently forces `popularity_score` 
into `columns_requested` even when the client did not request it, which changes 
the default response shape and breaks `select_columns` semantics. Keep 
`popularity_score` only in loaded/internal columns and preserve requested 
columns separately. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Popularity sort changes default dataset response shape.
   - ⚠️ `columns_requested` no longer reflects client input.
   ```
   </details>
   
   ```suggestion
       # Serialize
       requested_columns = request.select_columns or DEFAULT_DATASET_COLUMNS
       loaded_columns = list(requested_columns)
       if "popularity_score" not in loaded_columns:
           loaded_columns.append("popularity_score")
   
       ds_objs = []
       for item in ordered_items:
           obj = serializer(item, loaded_columns)
           if obj is not None:
               ds_objs.append(obj)
   
       attach_popularity_scores(ds_objs, scores)
   
       total_pages = (total_count + page_size - 1) // page_size if page_size > 
0 else 0
       pagination_info = PaginationInfo(
           page=page,
           page_size=page_size,
           total_count=total_count,
           total_pages=total_pages,
           has_next=page < total_pages - 1,
           has_previous=page > 0,
       )
   
       return DatasetList(
           datasets=ds_objs,
           count=len(ds_objs),
           total_count=total_count,
           page=page,
           page_size=page_size,
           total_pages=total_pages,
           has_previous=page > 0,
           has_next=page < total_pages - 1,
           columns_requested=requested_columns,
           columns_loaded=loaded_columns,
           columns_available=all_columns,
           sortable_columns=DATASET_SORTABLE_COLUMNS,
           filters_applied=request.filters if isinstance(request.filters, list) 
else [],
           pagination=pagination_info,
           timestamp=datetime.now(timezone.utc),
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Invoke `list_datasets` tool via MCP (tool is registered through
   `superset/mcp_service/app.py:29-32,93-95`) with 
`order_column="popularity_score"` and no
   `select_columns`.
   
   2. `list_datasets()` routes into `_list_datasets_by_popularity()` at
   `superset/mcp_service/dataset/tool/list_datasets.py:133-137`.
   
   3. In popularity path, code forces `"popularity_score"` into 
`select_columns` at
   `list_datasets.py:260-263` and sets both `columns_requested` and 
`columns_loaded` to this
   augmented list at `:291-292`.
   
   4. Outer serializer uses `result.columns_requested` 
(`list_datasets.py:193,208-210`), and
   dataset field filtering (`superset/mcp_service/dataset/schemas.py:164-173`) 
includes
   popularity_score in item output even though client did not provide 
`select_columns`.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/list_datasets.py
   **Line:** 260:298
   **Comment:**
        *Logic Error: Sorting by popularity currently forces `popularity_score` 
into `columns_requested` even when the client did not request it, which changes 
the default response shape and breaks `select_columns` semantics. Keep 
`popularity_score` only in loaded/internal columns and preserve requested 
columns separately.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=58e32f81c83c0db0a83026134b66dac258e4790a4b044d0d0602713ede0c8a01&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=58e32f81c83c0db0a83026134b66dac258e4790a4b044d0d0602713ede0c8a01&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to