Copilot commented on code in PR #40473:
URL: https://github.com/apache/superset/pull/40473#discussion_r3358257445


##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -288,6 +288,33 @@ def validate_search_and_filters(self) -> 
"ListDashboardsRequest":
         return self
 
 
+DEFAULT_GET_DASHBOARD_INFO_COLUMNS: List[str] = [
+    "id",
+    "dashboard_title",
+    "slug",
+    "description",
+    "certified_by",
+    "certification_details",
+    "published",
+    "is_managed_externally",
+    "external_url",
+    "created_on",
+    "changed_on",
+    "uuid",
+    "url",
+    "created_on_humanized",
+    "changed_on_humanized",
+    "chart_count",
+    "tags",
+    "charts",
+    "native_filters",
+    "cross_filters_enabled",
+    "is_permalink_state",
+    "permalink_key",
+    "filter_state",
+]

Review Comment:
   `DEFAULT_GET_DASHBOARD_INFO_COLUMNS` includes `filter_state`, but the 
`select_columns` field description says the default lean set excludes 
`filter_state` (and `get_dashboard_info` has logic to append it only when 
`permalink_key` is provided). Including it in the default makes permalink 
responses large by default and makes the `effective_select_columns` conditional 
a no-op.



##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -312,6 +339,28 @@ class GetDashboardInfoRequest(MetadataCacheControl):
             "from that permalink."
         ),
     )
+    select_columns: Annotated[
+        List[str],
+        Field(
+            default_factory=lambda: list(DEFAULT_GET_DASHBOARD_INFO_COLUMNS),
+            description=(
+                "Top-level fields to include in the response. Defaults to a 
lean "
+                "set that excludes 'css' (raw CSS, can be many KB) and 
'filter_state' "
+                "(only relevant when permalink_key is provided). Pass an 
explicit list "
+                "to override, e.g. ['id','dashboard_title','charts'] for 
minimal "
+                "output, or add 'css' to include raw dashboard CSS."
+            ),
+        ),
+    ]
+
+    @field_validator("select_columns", mode="before")
+    @classmethod
+    def _parse_select_columns(cls, value: Any) -> Any:
+        from superset.mcp_service.utils.schema_utils import parse_json_or_list
+
+        if value is None:
+            return list(DEFAULT_GET_DASHBOARD_INFO_COLUMNS)
+        return parse_json_or_list(value, "select_columns")

Review Comment:
   `_parse_select_columns` can return an empty list when callers send `""` or 
`[]` (since `parse_json_or_list` returns `[]`). Because 
`DashboardInfo._filter_fields_by_context` only filters when `select_columns` is 
truthy, an empty list disables filtering and returns the full payload 
(including large fields like `css`), which is the opposite of what 
`select_columns` implies and can re-trigger response truncation.



##########
superset/mcp_service/chart/schemas.py:
##########
@@ -298,6 +332,15 @@ def validate_identifier_or_form_data_key(self) -> 
"GetChartInfoRequest":
             )
         return self
 
+    @field_validator("select_columns", mode="before")
+    @classmethod
+    def _parse_select_columns(cls, value: Any) -> Any:
+        from superset.mcp_service.utils.schema_utils import parse_json_or_list
+
+        if value is None:
+            return list(DEFAULT_GET_CHART_INFO_COLUMNS)
+        return parse_json_or_list(value, "select_columns")
+

Review Comment:
   `GetChartInfoRequest._parse_select_columns` can return `[]` for `""`/`[]` 
inputs (because `parse_json_or_list` returns `[]`). Since 
`ChartInfo._filter_fields_by_context` only filters when `select_columns` is 
truthy, an empty list disables filtering and returns the full payload 
(including `form_data`), undermining the lean-default behavior and potentially 
re-triggering ResponseSizeGuard truncation.



##########
tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:
##########
@@ -1641,6 +1641,47 @@ async def 
test_default_columns_filters_actual_response_data(
                 )
 
 
+class TestGetDatasetInfoRequestValidators:
+    """Unit tests for GetDatasetInfoRequest field validators."""
+
+    def test_column_fields_json_string_parses_to_list(self):
+        """JSON string input for column_fields is decoded into a list."""
+        from superset.mcp_service.dataset.schemas import GetDatasetInfoRequest
+
+        request = GetDatasetInfoRequest(
+            identifier=1,
+            column_fields='["column_name","type"]',
+        )
+        assert request.column_fields == ["column_name", "type"]
+
+    def test_column_fields_empty_list_stays_empty(self):
+        """An explicit empty list for column_fields is preserved as-is."""
+        from superset.mcp_service.dataset.schemas import GetDatasetInfoRequest
+
+        request = GetDatasetInfoRequest(identifier=1, column_fields=[])
+        assert request.column_fields == []

Review Comment:
   This test locks in behavior where `column_fields=[]` is preserved. With the 
current serializers (`if column_fields:`), an empty list disables filtering and 
returns *all* per-column fields, which is likely to bloat responses and defeat 
the lean-default goal. If the intention is “empty means default lean fields” 
(same motivation as `select_columns`), update the expectation accordingly.



##########
superset/mcp_service/chart/tool/get_chart_info.py:
##########
@@ -333,6 +334,11 @@ async def get_chart_info(
             error = await _attach_dashboard_filters(result, 
request.dashboard_id, ctx)
             if error is not None:
                 return error
+
+        return result.model_dump(
+            mode="json",
+            context={"select_columns": request.select_columns},
+        )

Review Comment:
   Only the saved-chart path returns `result.model_dump(..., 
context={"select_columns": ...})`. The form_data_key-only (unsaved) path still 
returns a `ChartInfo` model directly, so `select_columns` filtering is never 
applied there and large `form_data` payloads can still be returned by default. 
To achieve consistent lean defaults, the unsaved path should also apply 
redaction (if needed) and then `model_dump(mode="json", 
context={"select_columns": request.select_columns})` before returning.



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -315,13 +334,78 @@ def create(cls, error: str, error_type: str) -> 
"DatasetError":
         )
 
 
+DEFAULT_GET_DATASET_INFO_COLUMNS: List[str] = [
+    "id",
+    "table_name",
+    "schema",
+    "database_name",
+    "database_id",
+    "uuid",
+    "is_virtual",
+    "description",
+    "main_dttm_col",
+    "sql",
+    "url",
+    "columns",
+    "metrics",
+]
+
+DEFAULT_GET_DATASET_INFO_COLUMN_FIELDS: List[str] = [
+    "column_name",
+    "type",
+    "is_dttm",
+]
+
+
 class GetDatasetInfoRequest(MetadataCacheControl):
     """Request schema for get_dataset_info with support for ID or UUID."""
 
     identifier: Annotated[
         int | str,
         Field(description="Dataset identifier - can be numeric ID or UUID 
string"),
     ]
+    select_columns: Annotated[
+        List[str],
+        Field(
+            default_factory=lambda: list(DEFAULT_GET_DATASET_INFO_COLUMNS),
+            description=(
+                "Top-level fields to include in the response. Defaults to a 
lean "
+                "set that excludes verbose fields like params, 
template_params, "
+                "extra, tags, certification_details. Pass an explicit list to "
+                "override (e.g. ['id','table_name','columns'] for minimal 
output)."
+            ),
+        ),
+    ]
+    column_fields: Annotated[
+        List[str],
+        Field(
+            default_factory=lambda: 
list(DEFAULT_GET_DATASET_INFO_COLUMN_FIELDS),
+            description=(
+                "Per-column fields to include for entries in 'columns'. 
Defaults "
+                "to ['column_name','type','is_dttm']. Pass a wider list to "
+                "include 'verbose_name','groupby','filterable','description' "
+                "when needed."
+            ),
+        ),
+    ]
+
+    @field_validator("select_columns", mode="before")
+    @classmethod
+    def _parse_select_columns(cls, value: Any) -> Any:
+        from superset.mcp_service.utils.schema_utils import parse_json_or_list
+
+        if value is None:
+            return list(DEFAULT_GET_DATASET_INFO_COLUMNS)
+        return parse_json_or_list(value, "select_columns")
+
+    @field_validator("column_fields", mode="before")
+    @classmethod
+    def _parse_column_fields(cls, value: Any) -> Any:
+        from superset.mcp_service.utils.schema_utils import parse_json_or_list
+
+        if value is None:
+            return list(DEFAULT_GET_DATASET_INFO_COLUMN_FIELDS)
+        return parse_json_or_list(value, "column_fields")

Review Comment:
   `GetDatasetInfoRequest` validators can return empty lists for `""`/`[]` 
inputs (via `parse_json_or_list`). Because 
`DatasetInfo._filter_fields_by_context` and 
`TableColumnInfo._filter_column_fields_by_context` only filter when the lists 
are truthy, an empty list disables filtering and returns the full dataset 
payload / full per-column payload (the opposite of what `select_columns` / 
`column_fields` implies), which can reintroduce oversized responses.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to