codeant-ai-for-open-source[bot] commented on code in PR #38725:
URL: https://github.com/apache/superset/pull/38725#discussion_r2955084916


##########
superset/mcp_service/chart/schemas.py:
##########
@@ -660,6 +660,32 @@ class XYChartConfig(BaseModel):
     filters: List[FilterConfig] | None = None
     row_limit: int = Field(10000, description="Max data points", ge=1, 
le=50000)
 
+    @model_validator(mode="before")
+    @classmethod
+    def handle_legacy_field_names(cls, data: Any) -> Any:
+        """Map Superset-native field names to MCP schema field names.
+
+        LLMs sometimes use native Superset field names instead of MCP field 
names.
+        This validator remaps known aliases before extra="forbid" validation 
runs.
+        """
+        if not isinstance(data, dict):
+            return data
+        # groupby: list[str] -> group_by: ColumnRef (take first item only)
+        # Always pop groupby to avoid extra="forbid" rejection; only map it if
+        # group_by is not already set (explicit group_by takes precedence).
+        if "groupby" in data:
+            groupby_val = data.pop("groupby")
+            if "group_by" not in data:
+                if isinstance(groupby_val, list) and groupby_val:
+                    data["group_by"] = {"name": str(groupby_val[0])}
+                elif isinstance(groupby_val, str):
+                    data["group_by"] = {"name": groupby_val}
+        # adhoc_filters: incompatible format with MCP FilterConfig - strip 
silently

Review Comment:
   **Suggestion:** Converting `groupby` entries with `str(...)` can turn 
non-string payloads into invalid column names (for example dicts become 
stringified objects), causing avoidable validation failures. Extract only valid 
string/dict name values instead of blindly coercing. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Structured legacy groupby requests fail XY schema validation.
   - ❌ generate_explore_link cannot return usable explore URL.
   - ⚠️ LLM retries may drop grouping semantics.
   ```
   </details>
   
   ```suggestion
                   if isinstance(groupby_val, list) and groupby_val:
                                   first_groupby = groupby_val[0]
                                   if isinstance(first_groupby, str):
                                       data["group_by"] = {"name": 
first_groupby}
                                   elif isinstance(first_groupby, dict) and 
isinstance(
                                       first_groupby.get("name"), str
                                   ):
                                       data["group_by"] = {"name": 
first_groupby["name"]}
                               elif isinstance(groupby_val, str):
                                   data["group_by"] = {"name": groupby_val}
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Invoke `generate_explore_link`
   (`superset/mcp_service/explore/tool/generate_explore_link.py:51-54`) or 
`generate_chart`
   (`superset/mcp_service/chart/tool/generate_chart.py:131-134`) with
   `config.chart_type="xy"` and legacy `groupby=[{"name":"region"}]`.
   
   2. During request parsing, `handle_legacy_field_names()` coerces first list 
item using
   `str(groupby_val[0])` (`superset/mcp_service/chart/schemas.py:679-683`), 
producing
   `"{'name': 'region'}"`.
   
   3. `ColumnRef.name` validation requires regex 
`^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$`
   (`superset/mcp_service/chart/schemas.py:385-390`), so stringified dict fails 
schema
   validation.
   
   4. `parse_request` converts that `ValidationError` into tool-level failure
   (`superset/mcp_service/utils/schema_utils.py:470-474`, `154-171`), blocking 
chart
   URL/chart generation.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/schemas.py
   **Line:** 679:683
   **Comment:**
        *Type Error: Converting `groupby` entries with `str(...)` can turn 
non-string payloads into invalid column names (for example dicts become 
stringified objects), causing avoidable validation failures. Extract only valid 
string/dict name values instead of blindly coercing.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38725&comment_hash=c5383bbaf04c838c015cbece0f1514c5e5e83cb56ef9e928c185393dba4ce809&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38725&comment_hash=c5383bbaf04c838c015cbece0f1514c5e5e83cb56ef9e928c185393dba4ce809&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to