codeant-ai-for-open-source[bot] commented on code in PR #40063:
URL: https://github.com/apache/superset/pull/40063#discussion_r3281655576
##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -148,22 +149,89 @@ class ChartLike(Protocol):
uuid: Any
-def _build_query_columns(form_data: Dict[str, Any]) -> list[str]:
- """Build query columns list from form_data, including both x_axis and
groupby."""
- x_axis_config = form_data.get("x_axis")
- groupby_columns: list[str] = form_data.get("groupby") or []
+def _build_query_columns(form_data: Dict[str, Any]) -> list[Column]:
+ """Build query columns list from form_data, including both x_axis and
groupby.
+
+ Handles chart-type-specific keys:
+ - Standard charts: ``groupby`` + ``x_axis``
+ - Pivot tables: ``groupbyColumns`` + ``groupbyRows`` (when ``groupby`` is
absent)
+ - Mixed timeseries: ``groupby_b`` (secondary groupby)
+ """
+ x_axis_config: Column | None = form_data.get("x_axis")
+ groupby_columns: list[Column] = form_data.get("groupby") or []
+
+ # Pivot tables store dimensions under groupbyColumns / groupbyRows
+ if not groupby_columns:
+ pivot_rows: list[Column] = form_data.get("groupbyRows") or []
+ pivot_cols: list[Column] = form_data.get("groupbyColumns") or []
+ groupby_columns = list(pivot_rows) + list(pivot_cols)
+
+ # Mixed timeseries stores secondary groupby under groupby_b
+ groupby_b: list[Column] = form_data.get("groupby_b") or []
+ for col in groupby_b:
+ if col not in groupby_columns:
+ groupby_columns.append(col)
Review Comment:
**Suggestion:** Both `groupby` and `groupby_b` are read as if they are
always lists, but legacy chart form data can supply a single string value;
iterating these values then treats each character as a separate column name.
This produces corrupted column lists and invalid group-by clauses. Normalize
string inputs to a one-item list before iterating. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Future use of `_build_query_columns` risks broken group-bys.
- ⚠️ Single-string groupby values become per-character grouping keys.
- ⚠️ Mixed timeseries secondary grouping can be silently corrupted.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. `_build_query_columns` in
`superset/mcp_service/chart/tool/get_chart_preview.py:152-195` initializes
`groupby_columns` and secondary groupbys as:
- `groupby_columns: list[Column] = form_data.get("groupby") or []` (line
161),
- `groupby_b: list[Column] = form_data.get("groupby_b") or []` (line 170),
and later iterates `for col in groupby_b:` appending `col` to
`groupby_columns` if not
present (lines 171-173), and then `for col in groupby_columns:` when
building the final
`columns` list (line 192).
2. Elsewhere in the codebase, group-by fields are explicitly normalized to
handle string
scalars: `resolve_groupby` in
`superset/mcp_service/chart/chart_helpers.py:64-75` does
`raw_groupby = form_data.get("groupby") or []` and then wraps string values
into a
one-element list (`groupby = [raw_groupby] if isinstance(raw_groupby, str)
else
list(raw_groupby)`), demonstrating that `groupby` can legitimately be a
single string
(e.g. `"country"`).
3. If `_build_query_columns` is called with form_data containing a scalar
groupby string,
e.g. `{"groupby": "country"}`, then `groupby_columns` at line 161 becomes
the string
`"country"`. Iterating `for col in groupby_columns` at line 192 treats each
character as a
separate column name; the deduplication helper `_add_unique` (lines 179-183)
sees `col` as
`"c"`, `"o"`, `"u"`, etc., so the resulting `columns` list becomes `["c",
"o", "u", "n",
"t", "r", "y"]` instead of `["country"]`. The same character-splitting
problem applies to
`groupby_b` at line 170 when it is a string.
4. Once `_build_query_columns` is wired into the production preview query
path (as tests
and the PR description imply), any chart preview that supplies a single
group-by dimension
as a string—consistent with the case handled by `resolve_groupby`—will
generate corrupted
group-by column lists. This would yield incorrect or failing queries for
such charts,
particularly affecting mixed timeseries (`groupby_b`) and any legacy or
single-dimension
configurations using scalar `groupby`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d57322dcc7a6422c92e58e5fc63de9d3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d57322dcc7a6422c92e58e5fc63de9d3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_preview.py
**Line:** 161:173
**Comment:**
*Type Error: Both `groupby` and `groupby_b` are read as if they are
always lists, but legacy chart form data can supply a single string value;
iterating these values then treats each character as a separate column name.
This produces corrupted column lists and invalid group-by clauses. Normalize
string inputs to a one-item list before iterating.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=790e2e363549e07c23e4537b79c62a650b90a25a892412e8d2b8319fbd306f32&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=790e2e363549e07c23e4537b79c62a650b90a25a892412e8d2b8319fbd306f32&reaction=dislike'>👎</a>
##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -148,22 +149,89 @@ class ChartLike(Protocol):
uuid: Any
-def _build_query_columns(form_data: Dict[str, Any]) -> list[str]:
- """Build query columns list from form_data, including both x_axis and
groupby."""
- x_axis_config = form_data.get("x_axis")
- groupby_columns: list[str] = form_data.get("groupby") or []
+def _build_query_columns(form_data: Dict[str, Any]) -> list[Column]:
+ """Build query columns list from form_data, including both x_axis and
groupby.
+
+ Handles chart-type-specific keys:
+ - Standard charts: ``groupby`` + ``x_axis``
+ - Pivot tables: ``groupbyColumns`` + ``groupbyRows`` (when ``groupby`` is
absent)
+ - Mixed timeseries: ``groupby_b`` (secondary groupby)
+ """
+ x_axis_config: Column | None = form_data.get("x_axis")
+ groupby_columns: list[Column] = form_data.get("groupby") or []
+
+ # Pivot tables store dimensions under groupbyColumns / groupbyRows
+ if not groupby_columns:
+ pivot_rows: list[Column] = form_data.get("groupbyRows") or []
+ pivot_cols: list[Column] = form_data.get("groupbyColumns") or []
+ groupby_columns = list(pivot_rows) + list(pivot_cols)
+
+ # Mixed timeseries stores secondary groupby under groupby_b
+ groupby_b: list[Column] = form_data.get("groupby_b") or []
+ for col in groupby_b:
+ if col not in groupby_columns:
+ groupby_columns.append(col)
+
+ # Deduplicate while preserving order
+ seen: set[str] = set()
+ columns: list[Column] = []
+
+ def _add_unique(col: Column) -> None:
+ key = col if isinstance(col, str) else col.get("label", str(col))
+ if key not in seen:
+ columns.append(col)
+ seen.add(key)
- columns = groupby_columns.copy()
if x_axis_config and isinstance(x_axis_config, str):
- if x_axis_config not in columns:
- columns.insert(0, x_axis_config)
+ _add_unique(x_axis_config)
elif x_axis_config and isinstance(x_axis_config, dict):
col_name = x_axis_config.get("column_name")
- if col_name and col_name not in columns:
- columns.insert(0, col_name)
+ if col_name and isinstance(col_name, str):
+ _add_unique(col_name)
+
+ for col in groupby_columns:
+ _add_unique(col)
+
return columns
+def _build_query_metrics(form_data: Dict[str, Any]) -> list[Metric]:
+ """Extract metrics from form_data, handling chart-type variations.
+
+ Handles:
+ - ``metrics`` (plural) — most chart types
+ - ``metric`` (singular) — Pie charts
+ - ``metrics_b`` — secondary y-axis in Mixed Timeseries charts
+ """
+ metrics: list[Metric] = list(form_data.get("metrics") or [])
Review Comment:
**Suggestion:** Converting `form_data["metrics"]` with `list(...)` will
split string metrics into characters (for example `"count"` becomes
`["c","o","u","n","t"]`), producing invalid metric payloads and broken query
generation for legacy or malformed form data that sends a single metric string.
Normalize by handling `str` explicitly before casting to list. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Future integration of `_build_query_metrics` risks invalid metrics.
- ⚠️ Legacy single-metric strings could break chart preview queries.
- ⚠️ Debugging character-split metrics in MCP previews is non-trivial.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. The helper `_build_query_metrics` is defined in
`superset/mcp_service/chart/tool/get_chart_preview.py:198-218`. Its first
line initializes
metrics as `metrics: list[Metric] = list(form_data.get("metrics") or [])`
(line 206).
2. If `form_data["metrics"]` is a string instead of a list, e.g. `{"metrics":
"SUM(revenue)"}`, then Python's `list()` constructor will split the string
into
characters. Calling `_build_query_metrics({"metrics": "SUM(revenue)"})` from
any caller
will produce `["S", "U", "M", "(", "r", ...]` instead of `["SUM(revenue)"]`,
because
`list("SUM(revenue)")` iterates characters.
3. The codebase already treats some scalar metric fields as strings (for
example, the
singular `metric` field is a string in
`test_build_query_metrics_singular_for_pie` at
`superset/tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py:20-23`,
and
`chart_helpers.resolve_metrics` at
`superset/mcp_service/chart/chart_helpers.py:53-61`
explicitly wraps a singular `metric` into a list). This shows that
single-metric-as-string
is an accepted shape in form_data.
4. Once `_build_query_metrics` is wired into the live preview query
construction (as
implied by the PR description and by tests importing it at
`superset/tests/unit_tests/mcp_service/chart/tool/test_get_chart_preview.py:9-16`),
any
chart preview that receives a string-valued `metrics` (e.g. legacy or
malformed form_data
that placed the single metric under `metrics` instead of `metric`) would
propagate a list
of characters into `ChartDataCommand`. That would yield invalid metric
payloads and likely
a query-building or execution error rather than the intended aggregation
metric.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=6032280b3f9947bd8e2eca60c51f92c6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=6032280b3f9947bd8e2eca60c51f92c6&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_preview.py
**Line:** 206:206
**Comment:**
*Type Error: Converting `form_data["metrics"]` with `list(...)` will
split string metrics into characters (for example `"count"` becomes
`["c","o","u","n","t"]`), producing invalid metric payloads and broken query
generation for legacy or malformed form data that sends a single metric string.
Normalize by handling `str` explicitly before casting to list.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=d7ca8820932ae7d01630ab91328e5ad3510eb1ed3036ecfe1fef3b0366671889&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40063&comment_hash=d7ca8820932ae7d01630ab91328e5ad3510eb1ed3036ecfe1fef3b0366671889&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]