codeant-ai-for-open-source[bot] commented on code in PR #40063:
URL: https://github.com/apache/superset/pull/40063#discussion_r3281655576


##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -148,22 +149,89 @@ class ChartLike(Protocol):
     uuid: Any
 
 
-def _build_query_columns(form_data: Dict[str, Any]) -> list[str]:
-    """Build query columns list from form_data, including both x_axis and 
groupby."""
-    x_axis_config = form_data.get("x_axis")
-    groupby_columns: list[str] = form_data.get("groupby") or []
+def _build_query_columns(form_data: Dict[str, Any]) -> list[Column]:
+    """Build query columns list from form_data, including both x_axis and 
groupby.
+
+    Handles chart-type-specific keys:
+    - Standard charts: ``groupby`` + ``x_axis``
+    - Pivot tables: ``groupbyColumns`` + ``groupbyRows`` (when ``groupby`` is 
absent)
+    - Mixed timeseries: ``groupby_b`` (secondary groupby)
+    """
+    x_axis_config: Column | None = form_data.get("x_axis")
+    groupby_columns: list[Column] = form_data.get("groupby") or []
+
+    # Pivot tables store dimensions under groupbyColumns / groupbyRows
+    if not groupby_columns:
+        pivot_rows: list[Column] = form_data.get("groupbyRows") or []
+        pivot_cols: list[Column] = form_data.get("groupbyColumns") or []
+        groupby_columns = list(pivot_rows) + list(pivot_cols)
+
+    # Mixed timeseries stores secondary groupby under groupby_b
+    groupby_b: list[Column] = form_data.get("groupby_b") or []
+    for col in groupby_b:
+        if col not in groupby_columns:
+            groupby_columns.append(col)

Review Comment:
   **Suggestion:** Both `groupby` and `groupby_b` are read as if they are 
always lists, but legacy chart form data can supply a single string value; 
iterating these values then treats each character as a separate column name. 
This produces corrupted column lists and invalid group-by clauses. Normalize 
string inputs to a one-item list before iterating. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Future use of `_build_query_columns` risks broken group-bys.
   - ⚠️ Single-string groupby values become per-character grouping keys.
   - ⚠️ Mixed timeseries secondary grouping can be silently corrupted.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. `_build_query_columns` in
   `superset/mcp_service/chart/tool/get_chart_preview.py:152-195` initializes
   `groupby_columns` and secondary groupbys as:
   
      - `groupby_columns: list[Column] = form_data.get("groupby") or []` (line 
161),
   
      - `groupby_b: list[Column] = form_data.get("groupby_b") or []` (line 170),
   
      and later iterates `for col in groupby_b:` appending `col` to 
`groupby_columns` if not
      present (lines 171-173), and then `for col in groupby_columns:` when 
building the final
      `columns` list (line 192).
   
   2. Elsewhere in the codebase, group-by fields are explicitly normalized to 
handle string
   scalars: `resolve_groupby` in 
`superset/mcp_service/chart/chart_helpers.py:64-75` does
   `raw_groupby = form_data.get("groupby") or []` and then wraps string values 
into a
   one-element list (`groupby = [raw_groupby] if isinstance(raw_groupby, str) 
else
   list(raw_groupby)`), demonstrating that `groupby` can legitimately be a 
single string
   (e.g. `"country"`).
   
   3. If `_build_query_columns` is called with form_data containing a scalar 
groupby string,
   e.g. `{"groupby": "country"}`, then `groupby_columns` at line 161 becomes 
the string
   `"country"`. Iterating `for col in groupby_columns` at line 192 treats each 
character as a
   separate column name; the deduplication helper `_add_unique` (lines 179-183) 
sees `col` as
   `"c"`, `"o"`, `"u"`, etc., so the resulting `columns` list becomes `["c", 
"o", "u", "n",
   "t", "r", "y"]` instead of `["country"]`. The same character-splitting 
problem applies to
   `groupby_b` at line 170 when it is a string.
   
   4. Once `_build_query_columns` is wired into the production preview query 
path (as tests
   and the PR description imply), any chart preview that supplies a single 
group-by dimension
   as a string—consistent with the case handled by `resolve_groupby`—will 
generate corrupted
   group-by column lists. This would yield incorrect or failing queries for 
such charts,
   particularly affecting mixed timeseries (`groupby_b`) and any legacy or 
single-dimension
   configurations using scalar `groupby`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d57322dcc7a6422c92e58e5fc63de9d3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d57322dcc7a6422c92e58e5fc63de9d3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_preview.py
   **Line:** 161:173
   **Comment:**
        *Type Error: Both `groupby` and `groupby_b` are read as if they are 
always lists, but legacy chart form data can supply a single string value; 
iterating these values then treats each character as a separate column name. 
This produces corrupted column lists and invalid group-by clauses. Normalize 
string inputs to a one-item list before iterating.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=790e2e363549e07c23e4537b79c62a650b90a25a892412e8d2b8319fbd306f32&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=790e2e363549e07c23e4537b79c62a650b90a25a892412e8d2b8319fbd306f32&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -148,22 +149,89 @@ class ChartLike(Protocol):
     uuid: Any
 
 
-def _build_query_columns(form_data: Dict[str, Any]) -> list[str]:
-    """Build query columns list from form_data, including both x_axis and 
groupby."""
-    x_axis_config = form_data.get("x_axis")
-    groupby_columns: list[str] = form_data.get("groupby") or []
+def _build_query_columns(form_data: Dict[str, Any]) -> list[Column]:
+    """Build query columns list from form_data, including both x_axis and 
groupby.
+
+    Handles chart-type-specific keys:
+    - Standard charts: ``groupby`` + ``x_axis``
+    - Pivot tables: ``groupbyColumns`` + ``groupbyRows`` (when ``groupby`` is 
absent)
+    - Mixed timeseries: ``groupby_b`` (secondary groupby)
+    """
+    x_axis_config: Column | None = form_data.get("x_axis")
+    groupby_columns: list[Column] = form_data.get("groupby") or []
+
+    # Pivot tables store dimensions under groupbyColumns / groupbyRows
+    if not groupby_columns:
+        pivot_rows: list[Column] = form_data.get("groupbyRows") or []
+        pivot_cols: list[Column] = form_data.get("groupbyColumns") or []
+        groupby_columns = list(pivot_rows) + list(pivot_cols)
+
+    # Mixed timeseries stores secondary groupby under groupby_b
+    groupby_b: list[Column] = form_data.get("groupby_b") or []
+    for col in groupby_b:
+        if col not in groupby_columns:
+            groupby_columns.append(col)
+
+    # Deduplicate while preserving order
+    seen: set[str] = set()
+    columns: list[Column] = []
+
+    def _add_unique(col: Column) -> None:
+        key = col if isinstance(col, str) else col.get("label", str(col))
+        if key not in seen:
+            columns.append(col)
+            seen.add(key)
 
-    columns = groupby_columns.copy()
     if x_axis_config and isinstance(x_axis_config, str):
-        if x_axis_config not in columns:
-            columns.insert(0, x_axis_config)
+        _add_unique(x_axis_config)
     elif x_axis_config and isinstance(x_axis_config, dict):
         col_name = x_axis_config.get("column_name")
-        if col_name and col_name not in columns:
-            columns.insert(0, col_name)
+        if col_name and isinstance(col_name, str):
+            _add_unique(col_name)
+
+    for col in groupby_columns:
+        _add_unique(col)
+
     return columns
 
 
+def _build_query_metrics(form_data: Dict[str, Any]) -> list[Metric]:
+    """Extract metrics from form_data, handling chart-type variations.
+
+    Handles:
+    - ``metrics`` (plural) — most chart types
+    - ``metric`` (singular) — Pie charts
+    - ``metrics_b`` — secondary y-axis in Mixed Timeseries charts
+    """
+    metrics: list[Metric] = list(form_data.get("metrics") or [])

Review Comment:
   **Suggestion:** Converting `form_data["metrics"]` with `list(...)` will 
split string metrics into characters (for example `"count"` becomes 
`["c","o","u","n","t"]`), producing invalid metric payloads and broken query 
generation for legacy or malformed form data that sends a single metric string. 
Normalize by handling `str` explicitly before casting to list. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Future integration of `_build_query_metrics` risks invalid metrics.
   - ⚠️ Legacy single-metric strings could break chart preview queries.
   - ⚠️ Debugging character-split metrics in MCP previews is non-trivial.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. The helper `_build_query_metrics` is defined in
   `superset/mcp_service/chart/tool/get_chart_preview.py:198-218`. Its first 
line initializes
   metrics as `metrics: list[Metric] = list(form_data.get("metrics") or [])` 
(line 206).
   
   2. If `form_data["metrics"]` is a string instead of a list, e.g. `{"metrics":
   "SUM(revenue)"}`, then Python's `list()` constructor will split the string 
into
   characters. Calling `_build_query_metrics({"metrics": "SUM(revenue)"})` from 
any caller
   will produce `["S", "U", "M", "(", "r", ...]` instead of `["SUM(revenue)"]`, 
because
   `list("SUM(revenue)")` iterates characters.
   
   3. The codebase already treats some scalar metric fields as strings (for 
example, the
   singular `metric` field is a string in 
`test_build_query_metrics_singular_for_pie` at
   
`superset/tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py:20-23`,
 and
   `chart_helpers.resolve_metrics` at 
`superset/mcp_service/chart/chart_helpers.py:53-61`
   explicitly wraps a singular `metric` into a list). This shows that 
single-metric-as-string
   is an accepted shape in form_data.
   
   4. Once `_build_query_metrics` is wired into the live preview query 
construction (as
   implied by the PR description and by tests importing it at
   
`superset/tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py:9-16`),
 any
   chart preview that receives a string-valued `metrics` (e.g. legacy or 
malformed form_data
   that placed the single metric under `metrics` instead of `metric`) would 
propagate a list
   of characters into `ChartDataCommand`. That would yield invalid metric 
payloads and likely
   a query-building or execution error rather than the intended aggregation 
metric.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6032280b3f9947bd8e2eca60c51f92c6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6032280b3f9947bd8e2eca60c51f92c6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_preview.py
   **Line:** 206:206
   **Comment:**
        *Type Error: Converting `form_data["metrics"]` with `list(...)` will 
split string metrics into characters (for example `"count"` becomes 
`["c","o","u","n","t"]`), producing invalid metric payloads and broken query 
generation for legacy or malformed form data that sends a single metric string. 
Normalize by handling `str` explicitly before casting to list.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=d7ca8820932ae7d01630ab91328e5ad3510eb1ed3036ecfe1fef3b0366671889&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=d7ca8820932ae7d01630ab91328e5ad3510eb1ed3036ecfe1fef3b0366671889&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to