Re: [PR] fix(mcp): trim get_dataset_info response to prevent oversized payloads [superset]

via GitHub Tue, 05 May 2026 17:21:01 -0700


aminghadersohi commented on code in PR #39898:
URL: https://github.com/apache/superset/pull/39898#discussion_r3192191849



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -93,6 +93,29 @@ class TableColumnInfo(BaseModel):
     filterable: bool | None = Field(None, description="Is filterable")
     description: str | None = Field(None, description="Column description")
 
+    @model_serializer(mode="wrap")
+    def _filter_column_fields_by_context(
+        self, serializer: Any, info: Any
+    ) -> Dict[str, Any]:
+        """Filter column fields based on serialization context.
+
+        If context contains 'column_fields', only include those fields.
+        Otherwise, include all fields. This trims wide datasets so a
+        50-column dataset doesn't ship 50 long descriptions when the
+        caller only needs column_name + type.
+        """
+        data = serializer(self)
+
+        if info.context and isinstance(info.context, dict):
+            column_fields = info.context.get("column_fields")
+            if column_fields:
+                requested = set(column_fields)
+                # Always preserve column_name as the only required field
+                requested.add("column_name")
+                return {k: v for k, v in data.items() if k in requested}

Review Comment:
   Good catch. Fixed in b888574b16 — both  and  now coerce to the lean default 
in the field validators rather than falling through to "no filter". Added a 
regression test () that passes an empty list and asserts the lean defaults are 
still applied.



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -93,6 +93,29 @@ class TableColumnInfo(BaseModel):
     filterable: bool | None = Field(None, description="Is filterable")
     description: str | None = Field(None, description="Column description")
 
+    @model_serializer(mode="wrap")
+    def _filter_column_fields_by_context(
+        self, serializer: Any, info: Any
+    ) -> Dict[str, Any]:
+        """Filter column fields based on serialization context.
+
+        If context contains 'column_fields', only include those fields.
+        Otherwise, include all fields. This trims wide datasets so a
+        50-column dataset doesn't ship 50 long descriptions when the
+        caller only needs column_name + type.
+        """
+        data = serializer(self)
+
+        if info.context and isinstance(info.context, dict):
+            column_fields = info.context.get("column_fields")
+            if column_fields:
+                requested = set(column_fields)
+                # Always preserve column_name as the only required field
+                requested.add("column_name")
+                return {k: v for k, v in data.items() if k in requested}

Review Comment:
   Good catch. Fixed in b888574b16 — both `select_columns=[]` and 
`column_fields=[]` now coerce to the lean default in the field validators 
rather than falling through to "no filter". Added a regression test 
(`test_get_dataset_info_empty_lists_fall_back_to_defaults`) that passes an 
empty list and asserts the lean defaults are still applied.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mcp): trim get_dataset_info response to prevent oversized payloads [superset]

Reply via email to