Re: [PR] fix(mcp): trim get_dataset_info response to prevent oversized payloads [superset]

via GitHub Tue, 05 May 2026 16:32:59 -0700


codeant-ai-for-open-source[bot] commented on code in PR #39898:
URL: https://github.com/apache/superset/pull/39898#discussion_r3192150614



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -93,6 +93,29 @@ class TableColumnInfo(BaseModel):
     filterable: bool | None = Field(None, description="Is filterable")
     description: str | None = Field(None, description="Column description")
 
+    @model_serializer(mode="wrap")
+    def _filter_column_fields_by_context(
+        self, serializer: Any, info: Any
+    ) -> Dict[str, Any]:
+        """Filter column fields based on serialization context.
+
+        If context contains 'column_fields', only include those fields.
+        Otherwise, include all fields. This trims wide datasets so a
+        50-column dataset doesn't ship 50 long descriptions when the
+        caller only needs column_name + type.
+        """
+        data = serializer(self)
+
+        if info.context and isinstance(info.context, dict):
+            column_fields = info.context.get("column_fields")
+            if column_fields:
+                requested = set(column_fields)
+                # Always preserve column_name as the only required field
+                requested.add("column_name")
+                return {k: v for k, v in data.items() if k in requested}

Review Comment:
   **Suggestion:** The column filtering check treats an explicitly provided 
empty list as "no filter" and returns all column fields. This is a logic bug 
because callers can pass `column_fields=[]` (or values that parse to an empty 
list) and unexpectedly get verbose fields like `description` for every column, 
which defeats the payload-size reduction and can reintroduce oversized 
responses/timeouts. Handle empty lists as a valid filter input (e.g., still 
enforce the minimal required field set) instead of falling back to full 
serialization. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ MCP `get_dataset_info` cannot honor explicit empty column_fields.
   - ⚠️ Wide datasets may still return verbose per-column descriptions.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In 
`superset/tests/unit_tests/mcp_service/dataset/tool/test_dataset_tools.py:18-31`,
   copy the pattern of `test_get_dataset_info_respects_column_fields` but 
change the request
   payload to use an empty list for `column_fields`:
   
      `{"request": {"identifier": 3, "select_columns": ["id", "columns"], 
"column_fields":
      []}}`.
   
   2. This request is validated into `GetDatasetInfoRequest` in
   `superset/mcp_service/dataset/schemas.py:172-221`; the 
`@field_validator("column_fields")`
   calls `parse_json_or_list` (see `schema_utils.py:111-151`), which returns 
`[]` unchanged
   for a Python list, so `request.column_fields` is an empty list, not `None`.
   
   3. The MCP tool handler `get_dataset_info` in
   `superset/mcp_service/dataset/tool/get_dataset_info.py:21-27` fetches a 
`DatasetInfo`
   instance, then at lines 119-126 calls `result.model_dump(..., 
context={"select_columns":
   request.select_columns, "column_fields": request.column_fields})`, so
   `info.context["column_fields"]` is `[]` for this call.
   
   4. During serialization, each `TableColumnInfo` is processed by
   `_filter_column_fields_by_context` in 
`superset/mcp_service/dataset/schemas.py:27-48`;
   `info.context` is a dict and `column_fields` is `[]`, so the `if 
column_fields:` check at
   lines 40-42 evaluates false and the method returns `data` unfiltered at line 
48, including
   verbose fields like `description`, `groupby`, `filterable`, etc. This 
contradicts the
   request's explicit `column_fields=[]` and re-expands column payloads, 
undermining the PR's
   goal of trimming oversized responses.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fmcp_service%2Fdataset%2Fschemas.py%0A%2A%2ALine%3A%2A%2A%20109%3A115%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20The%20column%20filtering%20check%20treats%20an%20explicitly%20provided%20empty%20list%20as%20%22no%20filter%22%20and%20returns%20all%20column%20fields.%20This%20is%20a%20logic%20bug%20because%20callers%20can%20pass%20%60column_fields%3D%5B%5D%60%20%28or%20values%20that%20parse%20to%20an%20empty%20list%29%20and%20unexpectedly%20get%20verbose%20fields%20like%20%60description%60%20for%20every%20column%2C%20which%20defeats%20the%20payload-size%20reduction%20and%20can%20reintroduce%20oversized%20responses%2Ftimeouts.%20Handle%20empty%20lists%20as%20a%20valid%20filter%20input%20%28e.g.%2C%20still%20enforce%20the%20minimal%20required%20field%20set%29%20instead%20of%20falling%20back%20to%20full%20serializ
 
ation.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20then%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt=This%20is%20a%20comment%20left%20during%20a%20code%20review.%0A%0A%2A%2APath%3A%2A%2A%20superset%2Fmcp_service%2Fdataset%2Fschemas.py%0A%2A%2ALine%3A%2A%2A%20109%3A115%0A%2A%2AComment%3A%2A%2A%0A%09%2ALogic%20Error%3A%20The%20column%20filtering%20check%20treats%20an%20explicitly%20provided%20empty%20list%20as%20%22no%20filter%22%20and%20returns%20all%20column%20fie
 
lds.%20This%20is%20a%20logic%20bug%20because%20callers%20can%20pass%20%60column_fields%3D%5B%5D%60%20%28or%20values%20that%20parse%20to%20an%20empty%20list%29%20and%20unexpectedly%20get%20verbose%20fields%20like%20%60description%60%20for%20every%20column%2C%20which%20defeats%20the%20payload-size%20reduction%20and%20can%20reintroduce%20oversized%20responses%2Ftimeouts.%20Handle%20empty%20lists%20as%20a%20valid%20filter%20input%20%28e.g.%2C%20still%20enforce%20the%20minimal%20required%20field%20set%29%20instead%20of%20falling%20back%20to%20full%20serialization.%0A%0AValidate%20the%20correctness%20of%20the%20flagged%20issue.%20If%20correct%2C%20How%20can%20I%20resolve%20this%3F%20If%20you%20propose%20a%20fix%2C%20implement%20it%20and%20please%20make%20it%20concise.%0AOnce%20fix%20is%20implemented%2C%20also%20check%20other%20comments%20on%20the%20same%20PR%2C%20and%20ask%20user%20if%20the%20user%20wants%20to%20fix%20the%20rest%20of%20the%20comments%20as%20well.%20if%20said%20yes%2C%20th
 
en%20fetch%20all%20the%20comments%20validate%20the%20correctness%20and%20implement%20a%20minimal%20fix%0A)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/schemas.py
   **Line:** 109:115
   **Comment:**
        *Logic Error: The column filtering check treats an explicitly provided 
empty list as "no filter" and returns all column fields. This is a logic bug 
because callers can pass `column_fields=[]` (or values that parse to an empty 
list) and unexpectedly get verbose fields like `description` for every column, 
which defeats the payload-size reduction and can reintroduce oversized 
responses/timeouts. Handle empty lists as a valid filter input (e.g., still 
enforce the minimal required field set) instead of falling back to full 
serialization.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39898&comment_hash=00d5af769ce41f5f3354d10cc202572dc4c972ea224c3f47e269ed5dda52833d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39898&comment_hash=00d5af769ce41f5f3354d10cc202572dc4c972ea224c3f47e269ed5dda52833d&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mcp): trim get_dataset_info response to prevent oversized payloads [superset]

Reply via email to