codeant-ai-for-open-source[bot] commented on code in PR #38416:
URL: https://github.com/apache/superset/pull/38416#discussion_r2943419643
##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -155,3 +196,85 @@ def _serialize_dashboard(
return result.model_dump(
mode="json", context={"select_columns": columns_to_filter}
)
+
+
+def _list_dashboards_by_popularity(
+ request: ListDashboardsRequest,
+ dao_class: Any,
+ serializer: Callable[..., dict[str, Any] | None],
+ all_columns: list[str],
+ ctx: Context,
+) -> DashboardList:
+ """Two-pass listing: sort all matching dashboards by popularity score."""
+ from superset.mcp_service.common.schema_discovery import (
+ DASHBOARD_SORTABLE_COLUMNS,
+ )
+
+ sorted_ids, scores, total_count = get_popularity_sorted_ids(
+ compute_fn=compute_dashboard_popularity,
+ dao_class=dao_class,
+ filters=request.filters,
+ search=request.search,
+ search_columns=DASHBOARD_SEARCH_COLUMNS,
+ order_direction=request.order_direction,
+ )
+
+ # Apply pagination to sorted IDs
+ page = max(request.page - 1, 0)
+ page_size = request.page_size
+ start = page * page_size
+ end = start + page_size
+
+ # Fetch full models for page IDs
+ if page_ids := sorted_ids[start:end]:
+ items = dao_class.find_by_ids(page_ids)
+ id_to_item = {item.id: item for item in items}
+ ordered_items = [id_to_item[pid] for pid in page_ids if pid in
id_to_item]
+ else:
+ ordered_items = []
+
+ # Serialize - preserve the original request columns for response filtering
+ columns_requested = request.select_columns or DEFAULT_DASHBOARD_COLUMNS
+ # Include popularity_score in response when sorting by it, even if not
+ # explicitly in select_columns (so clients can see the sort key)
+ if "popularity_score" not in columns_requested:
+ columns_requested = list(columns_requested) + ["popularity_score"]
+ # Expand select_columns for internal loading (need popularity_score for
+ # attach step)
+ select_columns = list(columns_requested)
+
+ dash_objs = []
+ for item in ordered_items:
+ obj = serializer(item, select_columns)
+ if obj is not None:
+ dash_objs.append(obj)
+
+ attach_popularity_scores(dash_objs, scores)
+
+ total_pages = (total_count + page_size - 1) // page_size if page_size > 0
else 0
+ pagination_info = PaginationInfo(
+ page=page,
+ page_size=page_size,
+ total_count=total_count,
+ total_pages=total_pages,
+ has_next=page < total_pages - 1,
+ has_previous=page > 0,
+ )
+
+ return DashboardList(
+ dashboards=dash_objs,
+ count=len(dash_objs),
+ total_count=total_count,
+ page=page,
+ page_size=page_size,
+ total_pages=total_pages,
+ has_previous=page > 0,
+ has_next=page < total_pages - 1,
+ columns_requested=columns_requested,
+ columns_loaded=select_columns,
+ columns_available=all_columns,
+ sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
+ filters_applied=request.filters if isinstance(request.filters, list)
else [],
+ pagination=pagination_info,
+ timestamp=datetime.now(timezone.utc),
+ )
Review Comment:
**Suggestion:** The popularity-sorted path returns pagination with a 0-based
page index, while the normal listing path and request schema use 1-based pages.
This causes inconsistent API behavior and breaks clients that rely on `page=1`
for the first page. Convert the response page value in this branch to 1-based
before building `PaginationInfo` and `DashboardList`. [off-by-one]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ `list_dashboards` popularity sort returns wrong page numbering.
- ❌ Clients mis-handle pagination boundaries on sorted dashboard results.
```
</details>
```suggestion
total_pages = (total_count + page_size - 1) // page_size if page_size >
0 else 0
page_1based = page + 1
pagination_info = PaginationInfo(
page=page_1based,
page_size=page_size,
total_count=total_count,
total_pages=total_pages,
has_next=page < total_pages - 1,
has_previous=page > 0,
)
return DashboardList(
dashboards=dash_objs,
count=len(dash_objs),
total_count=total_count,
page=page_1based,
page_size=page_size,
total_pages=total_pages,
has_previous=page > 0,
has_next=page < total_pages - 1,
columns_requested=columns_requested,
columns_loaded=select_columns,
columns_available=all_columns,
sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
filters_applied=request.filters if isinstance(request.filters, list)
else [],
pagination=pagination_info,
timestamp=datetime.now(timezone.utc),
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call MCP tool `list_dashboards` (registered via
`superset/mcp_service/app.py:405-420`,
imported from `superset/mcp_service/dashboard/tool/__init__.py:21`) with
`order_column="popularity_score"`, `page=1`, `page_size=10`.
2. Request validation uses `ListDashboardsRequest.page` documented as 1-based
(`superset/mcp_service/dashboard/schemas.py:242-245`), so clients correctly
send `page=1`
for first page.
3. In `superset/mcp_service/dashboard/tool/list_dashboards.py:133-137`,
popularity sort
routes to `_list_dashboards_by_popularity()` instead of
`ModelListCore.run_tool()`.
4. `_list_dashboards_by_popularity()` converts to zero-based index (`page =
max(request.page - 1, 0)` at line 223) and then returns response with
`PaginationInfo.page=page` and `DashboardList.page=page` (`lines 255-269`),
producing
`page=0` for first page.
5. Compare with normal non-popularity path: `ModelListCore.run_tool()`
explicitly converts
back to 1-based (`superset/mcp_service/mcp_core.py:184-209`), proving this
branch is
inconsistent and off-by-one.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/tool/list_dashboards.py
**Line:** 254:280
**Comment:**
*Off By One: The popularity-sorted path returns pagination with a
0-based page index, while the normal listing path and request schema use
1-based pages. This causes inconsistent API behavior and breaks clients that
rely on `page=1` for the first page. Convert the response page value in this
branch to 1-based before building `PaginationInfo` and `DashboardList`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bc68393e415695ed2152669767156ecdceb8fe02b2b0ef9580857d564f0317ab&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bc68393e415695ed2152669767156ecdceb8fe02b2b0ef9580857d564f0317ab&reaction=dislike'>👎</a>
##########
superset/mcp_service/dataset/tool/get_dataset_info.py:
##########
@@ -109,6 +110,11 @@ async def get_dataset_info(
result = tool.run_tool(request.identifier)
if isinstance(result, DatasetInfo):
+ # Compute popularity_score for single dataset retrieval
+ if result.id is not None:
+ scores = compute_dataset_popularity([result.id])
+ result.popularity_score = scores.get(result.id, 0.0)
Review Comment:
**Suggestion:** Popularity scoring is a non-critical enrichment, but it
currently runs inside the main `try` block without isolation. If
`compute_dataset_popularity` raises (for example due to a transient DB error),
the whole endpoint returns `DatasetError` instead of the successfully fetched
dataset. Wrap only the scoring step in its own `try/except` and default to
`0.0` so dataset retrieval remains reliable. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ get_dataset_info returns InternalError after successful lookup.
- ❌ Chart creation workflow blocked at dataset inspection.
- ⚠️ Optional popularity enrichment becomes hard failure dependency.
```
</details>
```suggestion
if result.id is not None:
try:
scores = compute_dataset_popularity([result.id])
result.popularity_score = scores.get(result.id, 0.0)
except Exception as score_error:
logger.warning(
"Failed to compute dataset popularity_score for
id=%s: %s",
result.id,
score_error,
)
result.popularity_score = 0.0
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start MCP service where `get_dataset_info` is registered by import in
`superset/mcp_service/app.py:410-413`, then call MCP tool `get_dataset_info`
(documented
in workflow at `superset/mcp_service/app.py:93-96`).
2. Request enters `get_dataset_info()` at
`superset/mcp_service/dataset/tool/get_dataset_info.py:48-50`; dataset fetch
succeeds via
`ModelGetInfoCore.run_tool(request.identifier)` at `get_dataset_info.py:110`
and returns
`DatasetInfo` (normal path in `superset/mcp_service/mcp_core.py:299-342`).
3. On that same success path, code always executes popularity enrichment at
`get_dataset_info.py:114-116`, calling
`compute_dataset_popularity([result.id])`.
4. If popularity query fails (e.g., DB/SQLAlchemy error during
`db.session.query(...)` in
`superset/mcp_service/common/popularity.py:64-74` or `:82-85`), exception
bubbles to outer
handler at `get_dataset_info.py:136-149`, converting response to
`DatasetError(InternalError)` even though dataset lookup already succeeded.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/get_dataset_info.py
**Line:** 114:116
**Comment:**
*Logic Error: Popularity scoring is a non-critical enrichment, but it
currently runs inside the main `try` block without isolation. If
`compute_dataset_popularity` raises (for example due to a transient DB error),
the whole endpoint returns `DatasetError` instead of the successfully fetched
dataset. Wrap only the scoring step in its own `try/except` and default to
`0.0` so dataset retrieval remains reliable.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bdfe3d80c51546fdc3a0c1233ecf8ab6db88c74595417c897041ce84b792a47b&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=bdfe3d80c51546fdc3a0c1233ecf8ab6db88c74595417c897041ce84b792a47b&reaction=dislike'>👎</a>
##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -156,3 +208,84 @@ def _serialize_chart(
except Exception as e:
await ctx.error("Failed to list charts: %s" % (str(e),))
raise
+
+
+def _list_charts_by_popularity(
+ request: ListChartsRequest,
+ dao_class: Any,
+ serializer: Callable[..., dict[str, Any] | None],
+ all_columns: list[str],
+ ctx: Context,
+) -> ChartList:
+ """Two-pass listing: sort all matching charts by popularity score."""
+ from superset.mcp_service.common.schema_discovery import
CHART_SORTABLE_COLUMNS
+
+ sorted_ids, scores, total_count = get_popularity_sorted_ids(
+ compute_fn=compute_chart_popularity,
+ dao_class=dao_class,
+ filters=request.filters,
+ search=request.search,
+ search_columns=CHART_SEARCH_COLUMNS,
+ order_direction=request.order_direction,
+ )
+
+ # Apply pagination to sorted IDs
+ page = max(request.page - 1, 0)
+ page_size = request.page_size
+ start = page * page_size
+ end = start + page_size
+
+ # Fetch full models for page IDs
+ if page_ids := sorted_ids[start:end]:
+ items = dao_class.find_by_ids(page_ids)
+ # Preserve popularity sort order
+ id_to_item = {item.id: item for item in items}
+ ordered_items = [id_to_item[pid] for pid in page_ids if pid in
id_to_item]
+ else:
+ ordered_items = []
+
+ # Serialize - preserve the original request columns for response filtering
+ columns_requested = request.select_columns or DEFAULT_CHART_COLUMNS
+ # Include popularity_score in response when sorting by it, even if not
+ # explicitly in select_columns (so clients can see the sort key)
+ if "popularity_score" not in columns_requested:
+ columns_requested = list(columns_requested) + ["popularity_score"]
+ # Expand select_columns for internal loading
+ select_columns = list(columns_requested)
+
+ chart_objs = []
+ for item in ordered_items:
+ obj = serializer(item, select_columns)
+ if obj is not None:
+ chart_objs.append(obj)
+
+ # Attach scores
+ attach_popularity_scores(chart_objs, scores)
+
+ total_pages = (total_count + page_size - 1) // page_size if page_size > 0
else 0
+ pagination_info = PaginationInfo(
+ page=page,
+ page_size=page_size,
+ total_count=total_count,
+ total_pages=total_pages,
+ has_next=page < total_pages - 1,
+ has_previous=page > 0,
+ )
+
+ return ChartList(
+ charts=chart_objs,
+ count=len(chart_objs),
+ total_count=total_count,
+ page=page,
+ page_size=page_size,
+ total_pages=total_pages,
+ has_previous=page > 0,
+ has_next=page < total_pages - 1,
+ columns_requested=columns_requested,
Review Comment:
**Suggestion:** The popularity-sorted path returns zero-based `page` values,
but the rest of the MCP list stack and request contract are 1-based in
responses. This causes inconsistent pagination metadata depending on sort mode
and can break clients that rely on a stable page convention. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Popularity-sorted chart pagination metadata becomes inconsistent.
- ⚠️ Clients relying on 1-based paging may mis-navigate.
- ❌ First popularity page can report page zero.
```
</details>
```suggestion
total_pages = (total_count + page_size - 1) // page_size if page_size >
0 else 0
page_1based = page + 1
pagination_info = PaginationInfo(
page=page_1based,
page_size=page_size,
total_count=total_count,
total_pages=total_pages,
has_next=page < total_pages - 1,
has_previous=page > 0,
)
return ChartList(
charts=chart_objs,
count=len(chart_objs),
total_count=total_count,
page=page_1based,
page_size=page_size,
total_pages=total_pages,
has_previous=page > 0,
has_next=page < total_pages - 1,
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call MCP tool `list_charts` (registered in
`superset/mcp_service/chart/tool/list_charts.py:86-88`) with
`order_column="popularity_score"` and `page=1`.
2. Execution takes popularity branch at
`superset/mcp_service/chart/tool/list_charts.py:139-143`, which calls
`_list_charts_by_popularity`.
3. `_list_charts_by_popularity` converts request page to zero-based at
`superset/mcp_service/chart/tool/list_charts.py:233` (`page=max(request.page
- 1, 0)`),
then returns `PaginationInfo.page=page` and `ChartList.page=page` at lines
`267` and
`279`.
4. Compare with standard path: `ModelListCore.run_tool` explicitly returns
1-based
response page (`page_1based = page + 1`) at
`superset/mcp_service/mcp_core.py:184-209`;
also app instructions say list tools are 1-based
(`superset/mcp_service/app.py:62`,
`175`). Result: popularity-sorted responses return `page=0` for first page,
inconsistent
with non-popularity responses.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/list_charts.py
**Line:** 265:284
**Comment:**
*Logic Error: The popularity-sorted path returns zero-based `page`
values, but the rest of the MCP list stack and request contract are 1-based in
responses. This causes inconsistent pagination metadata depending on sort mode
and can break clients that rely on a stable page convention.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=828fa53cd7c65ca1b891d2178835244940e7a296a5480d71bf76673fadd950bc&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=828fa53cd7c65ca1b891d2178835244940e7a296a5480d71bf76673fadd950bc&reaction=dislike'>👎</a>
##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -176,3 +220,82 @@ def _serialize_dataset(
)
)
raise
+
+
+def _list_datasets_by_popularity(
+ request: ListDatasetsRequest,
+ dao_class: Any,
+ serializer: Callable[..., dict[str, Any] | None],
+ all_columns: list[str],
+ ctx: Context,
+) -> DatasetList:
+ """Two-pass listing: sort all matching datasets by popularity score."""
+ from superset.mcp_service.common.schema_discovery import
DATASET_SORTABLE_COLUMNS
+
+ sorted_ids, scores, total_count = get_popularity_sorted_ids(
+ compute_fn=compute_dataset_popularity,
+ dao_class=dao_class,
+ filters=request.filters,
+ search=request.search,
+ search_columns=DATASET_SEARCH_COLUMNS,
+ order_direction=request.order_direction,
+ )
+
+ # Apply pagination to sorted IDs
+ page = max(request.page - 1, 0)
+ page_size = request.page_size
+ start = page * page_size
+ end = start + page_size
Review Comment:
**Suggestion:** The popularity-sort branch returns zero-based pagination
metadata (`page=0` for the first page), but the request schema and the standard
`ModelListCore` path use one-based pages. This creates an API contract break
where the same endpoint reports different page numbering depending on sort
mode. Convert the internal page index back to one-based before building
`PaginationInfo` and `DatasetList`. [off-by-one]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Popularity-sorted dataset pagination reports wrong page number.
- ⚠️ Clients relying on 1-based pagination mis-handle navigation.
- ⚠️ Same endpoint behaves differently by sort mode.
```
</details>
```suggestion
page_index = max(request.page - 1, 0)
page_size = request.page_size
start = page_index * page_size
end = start + page_size
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call MCP tool `list_datasets` (tool entrypoint at
`superset/mcp_service/dataset/tool/list_datasets.py:75-77`) with request
`page=1`,
`page_size=10`, `order_column="popularity_score"`.
2. Request follows popularity branch at `list_datasets.py:137-142` and
invokes
`_list_datasets_by_popularity(...)`.
3. `_list_datasets_by_popularity` converts to zero-based index (`page =
max(request.page -
1, 0)`) at `list_datasets.py:245`, then returns `DatasetList.page=page` and
`PaginationInfo.page=page` at `list_datasets.py:276-290`.
4. Observe response `page=0` for first page, which conflicts with declared
1-based
contract in `ListDatasetsRequest`
(`superset/mcp_service/dataset/schemas.py:251-254`) and
standard path behavior in `ModelListCore`
(`superset/mcp_service/mcp_core.py:184-209`).
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/list_datasets.py
**Line:** 245:248
**Comment:**
*Off By One: The popularity-sort branch returns zero-based pagination
metadata (`page=0` for the first page), but the request schema and the standard
`ModelListCore` path use one-based pages. This creates an API contract break
where the same endpoint reports different page numbering depending on sort
mode. Convert the internal page index back to one-based before building
`PaginationInfo` and `DatasetList`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=f85ebc4b585b1432fa244998e5a2ada8934a35fdda12a6d34d15175fa9f8190e&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38416&comment_hash=f85ebc4b585b1432fa244998e5a2ada8934a35fdda12a6d34d15175fa9f8190e&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]