Re: [PR] feat(mcp): add popularity_score computed field for charts, dashboards, datasets [superset]

via GitHub Mon, 16 Mar 2026 15:52:30 -0700


codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2943419643



##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -155,3 +196,85 @@ def _serialize_dashboard(
         return result.model_dump(
             mode="json", context={"select_columns": columns_to_filter}
         )
+
+
+def _list_dashboards_by_popularity(
+    request: ListDashboardsRequest,
+    dao_class: Any,
+    serializer: Callable[..., dict[str, Any] | None],
+    all_columns: list[str],
+    ctx: Context,
+) -> DashboardList:
+    """Two-pass listing: sort all matching dashboards by popularity score."""
+    from superset.mcp_service.common.schema_discovery import (
+        DASHBOARD_SORTABLE_COLUMNS,
+    )
+
+    sorted_ids, scores, total_count = get_popularity_sorted_ids(
+        compute_fn=compute_dashboard_popularity,
+        dao_class=dao_class,
+        filters=request.filters,
+        search=request.search,
+        search_columns=DASHBOARD_SEARCH_COLUMNS,
+        order_direction=request.order_direction,
+    )
+
+    # Apply pagination to sorted IDs
+    page = max(request.page - 1, 0)
+    page_size = request.page_size
+    start = page * page_size
+    end = start + page_size
+
+    # Fetch full models for page IDs
+    if page_ids := sorted_ids[start:end]:
+        items = dao_class.find_by_ids(page_ids)
+        id_to_item = {item.id: item for item in items}
+        ordered_items = [id_to_item[pid] for pid in page_ids if pid in 
id_to_item]
+    else:
+        ordered_items = []
+
+    # Serialize - preserve the original request columns for response filtering
+    columns_requested = request.select_columns or DEFAULT_DASHBOARD_COLUMNS
+    # Include popularity_score in response when sorting by it, even if not
+    # explicitly in select_columns (so clients can see the sort key)
+    if "popularity_score" not in columns_requested:
+        columns_requested = list(columns_requested) + ["popularity_score"]
+    # Expand select_columns for internal loading (need popularity_score for
+    # attach step)
+    select_columns = list(columns_requested)
+
+    dash_objs = []
+    for item in ordered_items:
+        obj = serializer(item, select_columns)
+        if obj is not None:
+            dash_objs.append(obj)
+
+    attach_popularity_scores(dash_objs, scores)
+
+    total_pages = (total_count + page_size - 1) // page_size if page_size > 0 
else 0
+    pagination_info = PaginationInfo(
+        page=page,
+        page_size=page_size,
+        total_count=total_count,
+        total_pages=total_pages,
+        has_next=page < total_pages - 1,
+        has_previous=page > 0,
+    )
+
+    return DashboardList(
+        dashboards=dash_objs,
+        count=len(dash_objs),
+        total_count=total_count,
+        page=page,
+        page_size=page_size,
+        total_pages=total_pages,
+        has_previous=page > 0,
+        has_next=page < total_pages - 1,
+        columns_requested=columns_requested,
+        columns_loaded=select_columns,
+        columns_available=all_columns,
+        sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
+        filters_applied=request.filters if isinstance(request.filters, list) 
else [],
+        pagination=pagination_info,
+        timestamp=datetime.now(timezone.utc),
+    )

Review Comment:
   **Suggestion:** The popularity-sorted path returns pagination with a 0-based 
page index, while the normal listing path and request schema use 1-based pages. 
This causes inconsistent API behavior and breaks clients that rely on `page=1` 
for the first page. Convert the response page value in this branch to 1-based 
before building `PaginationInfo` and `DashboardList`. [off-by-one]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ `list_dashboards` popularity sort returns wrong page numbering.
   - ❌ Clients mis-handle pagination boundaries on sorted dashboard results.
   ```
   </details>
   
   ```suggestion
       total_pages = (total_count + page_size - 1) // page_size if page_size > 
0 else 0
       page_1based = page + 1
       pagination_info = PaginationInfo(
           page=page_1based,
           page_size=page_size,
           total_count=total_count,
           total_pages=total_pages,
           has_next=page < total_pages - 1,
           has_previous=page > 0,
       )
   
       return DashboardList(
           dashboards=dash_objs,
           count=len(dash_objs),
           total_count=total_count,
           page=page_1based,
           page_size=page_size,
           total_pages=total_pages,
           has_previous=page > 0,
           has_next=page < total_pages - 1,
           columns_requested=columns_requested,
           columns_loaded=select_columns,
           columns_available=all_columns,
           sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
           filters_applied=request.filters if isinstance(request.filters, list) 
else [],
           pagination=pagination_info,
           timestamp=datetime.now(timezone.utc),
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP tool `list_dashboards` (registered via 
`superset/mcp_service/app.py:405-420`,
   imported from `superset/mcp_service/dashboard/tool/__init__.py:21`) with
   `order_column="popularity_score"`, `page=1`, `page_size=10`.
   
   2. Request validation uses `ListDashboardsRequest.page` documented as 1-based
   (`superset/mcp_service/dashboard/schemas.py:242-245`), so clients correctly 
send `page=1`
   for first page.
   
   3. In `superset/mcp_service/dashboard/tool/list_dashboards.py:133-137`, 
popularity sort
   routes to `_list_dashboards_by_popularity()` instead of 
`ModelListCore.run_tool()`.
   
   4. `_list_dashboards_by_popularity()` converts to zero-based index (`page =
   max(request.page - 1, 0)` at line 223) and then returns response with
   `PaginationInfo.page=page` and `DashboardList.page=page` (`lines 255-269`), 
producing
   `page=0` for first page.
   
   5. Compare with normal non-popularity path: `ModelListCore.run_tool()` 
explicitly converts
   back to 1-based (`superset/mcp_service/mcp_core.py:184-209`), proving this 
branch is
   inconsistent and off-by-one.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/list_dashboards.py
   **Line:** 254:280
   **Comment:**
        *Off By One: The popularity-sorted path returns pagination with a 
0-based page index, while the normal listing path and request schema use 
1-based pages. This causes inconsistent API behavior and breaks clients that 
rely on `page=1` for the first page. Convert the response page value in this 
branch to 1-based before building `PaginationInfo` and `DashboardList`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bc68393e415695ed2152669767156ecdceb8fe02b2b0ef9580857d564f0317ab&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bc68393e415695ed2152669767156ecdceb8fe02b2b0ef9580857d564f0317ab&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/get_dataset_info.py:
##########
@@ -109,6 +110,11 @@ async def get_dataset_info(
             result = tool.run_tool(request.identifier)
 
         if isinstance(result, DatasetInfo):
+            # Compute popularity_score for single dataset retrieval
+            if result.id is not None:
+                scores = compute_dataset_popularity([result.id])
+                result.popularity_score = scores.get(result.id, 0.0)

Review Comment:
   **Suggestion:** Popularity scoring is a non-critical enrichment, but it 
currently runs inside the main `try` block without isolation. If 
`compute_dataset_popularity` raises (for example due to a transient DB error), 
the whole endpoint returns `DatasetError` instead of the successfully fetched 
dataset. Wrap only the scoring step in its own `try/except` and default to 
`0.0` so dataset retrieval remains reliable. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ get_dataset_info returns InternalError after successful lookup.
   - ❌ Chart creation workflow blocked at dataset inspection.
   - ⚠️ Optional popularity enrichment becomes hard failure dependency.
   ```
   </details>
   
   ```suggestion
               if result.id is not None:
                   try:
                       scores = compute_dataset_popularity([result.id])
                       result.popularity_score = scores.get(result.id, 0.0)
                   except Exception as score_error:
                       logger.warning(
                           "Failed to compute dataset popularity_score for 
id=%s: %s",
                           result.id,
                           score_error,
                       )
                       result.popularity_score = 0.0
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Start MCP service where `get_dataset_info` is registered by import in
   `superset/mcp_service/app.py:410-413`, then call MCP tool `get_dataset_info` 
(documented
   in workflow at `superset/mcp_service/app.py:93-96`).
   
   2. Request enters `get_dataset_info()` at
   `superset/mcp_service/dataset/tool/get_dataset_info.py:48-50`; dataset fetch 
succeeds via
   `ModelGetInfoCore.run_tool(request.identifier)` at `get_dataset_info.py:110` 
and returns
   `DatasetInfo` (normal path in `superset/mcp_service/mcp_core.py:299-342`).
   
   3. On that same success path, code always executes popularity enrichment at
   `get_dataset_info.py:114-116`, calling 
`compute_dataset_popularity([result.id])`.
   
   4. If popularity query fails (e.g., DB/SQLAlchemy error during 
`db.session.query(...)` in
   `superset/mcp_service/common/popularity.py:64-74` or `:82-85`), exception 
bubbles to outer
   handler at `get_dataset_info.py:136-149`, converting response to
   `DatasetError(InternalError)` even though dataset lookup already succeeded.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/get_dataset_info.py
   **Line:** 114:116
   **Comment:**
        *Logic Error: Popularity scoring is a non-critical enrichment, but it 
currently runs inside the main `try` block without isolation. If 
`compute_dataset_popularity` raises (for example due to a transient DB error), 
the whole endpoint returns `DatasetError` instead of the successfully fetched 
dataset. Wrap only the scoring step in its own `try/except` and default to 
`0.0` so dataset retrieval remains reliable.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bdfe3d80c51546fdc3a0c1233ecf8ab6db88c74595417c897041ce84b792a47b&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bdfe3d80c51546fdc3a0c1233ecf8ab6db88c74595417c897041ce84b792a47b&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -156,3 +208,84 @@ def _serialize_chart(
     except Exception as e:
         await ctx.error("Failed to list charts: %s" % (str(e),))
         raise
+
+
+def _list_charts_by_popularity(
+    request: ListChartsRequest,
+    dao_class: Any,
+    serializer: Callable[..., dict[str, Any] | None],
+    all_columns: list[str],
+    ctx: Context,
+) -> ChartList:
+    """Two-pass listing: sort all matching charts by popularity score."""
+    from superset.mcp_service.common.schema_discovery import 
CHART_SORTABLE_COLUMNS
+
+    sorted_ids, scores, total_count = get_popularity_sorted_ids(
+        compute_fn=compute_chart_popularity,
+        dao_class=dao_class,
+        filters=request.filters,
+        search=request.search,
+        search_columns=CHART_SEARCH_COLUMNS,
+        order_direction=request.order_direction,
+    )
+
+    # Apply pagination to sorted IDs
+    page = max(request.page - 1, 0)
+    page_size = request.page_size
+    start = page * page_size
+    end = start + page_size
+
+    # Fetch full models for page IDs
+    if page_ids := sorted_ids[start:end]:
+        items = dao_class.find_by_ids(page_ids)
+        # Preserve popularity sort order
+        id_to_item = {item.id: item for item in items}
+        ordered_items = [id_to_item[pid] for pid in page_ids if pid in 
id_to_item]
+    else:
+        ordered_items = []
+
+    # Serialize - preserve the original request columns for response filtering
+    columns_requested = request.select_columns or DEFAULT_CHART_COLUMNS
+    # Include popularity_score in response when sorting by it, even if not
+    # explicitly in select_columns (so clients can see the sort key)
+    if "popularity_score" not in columns_requested:
+        columns_requested = list(columns_requested) + ["popularity_score"]
+    # Expand select_columns for internal loading
+    select_columns = list(columns_requested)
+
+    chart_objs = []
+    for item in ordered_items:
+        obj = serializer(item, select_columns)
+        if obj is not None:
+            chart_objs.append(obj)
+
+    # Attach scores
+    attach_popularity_scores(chart_objs, scores)
+
+    total_pages = (total_count + page_size - 1) // page_size if page_size > 0 
else 0
+    pagination_info = PaginationInfo(
+        page=page,
+        page_size=page_size,
+        total_count=total_count,
+        total_pages=total_pages,
+        has_next=page < total_pages - 1,
+        has_previous=page > 0,
+    )
+
+    return ChartList(
+        charts=chart_objs,
+        count=len(chart_objs),
+        total_count=total_count,
+        page=page,
+        page_size=page_size,
+        total_pages=total_pages,
+        has_previous=page > 0,
+        has_next=page < total_pages - 1,
+        columns_requested=columns_requested,

Review Comment:
   **Suggestion:** The popularity-sorted path returns zero-based `page` values, 
but the rest of the MCP list stack and request contract are 1-based in 
responses. This causes inconsistent pagination metadata depending on sort mode 
and can break clients that rely on a stable page convention. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Popularity-sorted chart pagination metadata becomes inconsistent.
   - ⚠️ Clients relying on 1-based paging may mis-navigate.
   - ❌ First popularity page can report page zero.
   ```
   </details>
   
   ```suggestion
       total_pages = (total_count + page_size - 1) // page_size if page_size > 
0 else 0
       page_1based = page + 1
       pagination_info = PaginationInfo(
           page=page_1based,
           page_size=page_size,
           total_count=total_count,
           total_pages=total_pages,
           has_next=page < total_pages - 1,
           has_previous=page > 0,
       )
   
       return ChartList(
           charts=chart_objs,
           count=len(chart_objs),
           total_count=total_count,
           page=page_1based,
           page_size=page_size,
           total_pages=total_pages,
           has_previous=page > 0,
           has_next=page < total_pages - 1,
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP tool `list_charts` (registered in
   `superset/mcp_service/chart/tool/list_charts.py:86-88`) with
   `order_column="popularity_score"` and `page=1`.
   
   2. Execution takes popularity branch at
   `superset/mcp_service/chart/tool/list_charts.py:139-143`, which calls
   `_list_charts_by_popularity`.
   
   3. `_list_charts_by_popularity` converts request page to zero-based at
   `superset/mcp_service/chart/tool/list_charts.py:233` (`page=max(request.page 
- 1, 0)`),
   then returns `PaginationInfo.page=page` and `ChartList.page=page` at lines 
`267` and
   `279`.
   
   4. Compare with standard path: `ModelListCore.run_tool` explicitly returns 
1-based
   response page (`page_1based = page + 1`) at 
`superset/mcp_service/mcp_core.py:184-209`;
   also app instructions say list tools are 1-based 
(`superset/mcp_service/app.py:62`,
   `175`). Result: popularity-sorted responses return `page=0` for first page, 
inconsistent
   with non-popularity responses.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/list_charts.py
   **Line:** 265:284
   **Comment:**
        *Logic Error: The popularity-sorted path returns zero-based `page` 
values, but the rest of the MCP list stack and request contract are 1-based in 
responses. This causes inconsistent pagination metadata depending on sort mode 
and can break clients that rely on a stable page convention.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=828fa53cd7c65ca1b891d2178835244940e7a296a5480d71bf76673fadd950bc&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=828fa53cd7c65ca1b891d2178835244940e7a296a5480d71bf76673fadd950bc&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -176,3 +220,82 @@ def _serialize_dataset(
             )
         )
         raise
+
+
+def _list_datasets_by_popularity(
+    request: ListDatasetsRequest,
+    dao_class: Any,
+    serializer: Callable[..., dict[str, Any] | None],
+    all_columns: list[str],
+    ctx: Context,
+) -> DatasetList:
+    """Two-pass listing: sort all matching datasets by popularity score."""
+    from superset.mcp_service.common.schema_discovery import 
DATASET_SORTABLE_COLUMNS
+
+    sorted_ids, scores, total_count = get_popularity_sorted_ids(
+        compute_fn=compute_dataset_popularity,
+        dao_class=dao_class,
+        filters=request.filters,
+        search=request.search,
+        search_columns=DATASET_SEARCH_COLUMNS,
+        order_direction=request.order_direction,
+    )
+
+    # Apply pagination to sorted IDs
+    page = max(request.page - 1, 0)
+    page_size = request.page_size
+    start = page * page_size
+    end = start + page_size

Review Comment:
   **Suggestion:** The popularity-sort branch returns zero-based pagination 
metadata (`page=0` for the first page), but the request schema and the standard 
`ModelListCore` path use one-based pages. This creates an API contract break 
where the same endpoint reports different page numbering depending on sort 
mode. Convert the internal page index back to one-based before building 
`PaginationInfo` and `DatasetList`. [off-by-one]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Popularity-sorted dataset pagination reports wrong page number.
   - ⚠️ Clients relying on 1-based pagination mis-handle navigation.
   - ⚠️ Same endpoint behaves differently by sort mode.
   ```
   </details>
   
   ```suggestion
       page_index = max(request.page - 1, 0)
       page_size = request.page_size
       start = page_index * page_size
       end = start + page_size
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call MCP tool `list_datasets` (tool entrypoint at
   `superset/mcp_service/dataset/tool/list_datasets.py:75-77`) with request 
`page=1`,
   `page_size=10`, `order_column="popularity_score"`.
   
   2. Request follows popularity branch at `list_datasets.py:137-142` and 
invokes
   `_list_datasets_by_popularity(...)`.
   
   3. `_list_datasets_by_popularity` converts to zero-based index (`page = 
max(request.page -
   1, 0)`) at `list_datasets.py:245`, then returns `DatasetList.page=page` and
   `PaginationInfo.page=page` at `list_datasets.py:276-290`.
   
   4. Observe response `page=0` for first page, which conflicts with declared 
1-based
   contract in `ListDatasetsRequest` 
(`superset/mcp_service/dataset/schemas.py:251-254`) and
   standard path behavior in `ModelListCore` 
(`superset/mcp_service/mcp_core.py:184-209`).
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/list_datasets.py
   **Line:** 245:248
   **Comment:**
        *Off By One: The popularity-sort branch returns zero-based pagination 
metadata (`page=0` for the first page), but the request schema and the standard 
`ModelListCore` path use one-based pages. This creates an API contract break 
where the same endpoint reports different page numbering depending on sort 
mode. Convert the internal page index back to one-based before building 
`PaginationInfo` and `DatasetList`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=f85ebc4b585b1432fa244998e5a2ada8934a35fdda12a6d34d15175fa9f8190e&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=f85ebc4b585b1432fa244998e5a2ada8934a35fdda12a6d34d15175fa9f8190e&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(mcp): add popularity_score computed field for charts, dashboards, datasets [superset]

Reply via email to