codeant-ai-for-open-source[bot] commented on code in PR #37639:
URL: https://github.com/apache/superset/pull/37639#discussion_r2759657634
##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -185,6 +185,19 @@ def generate(self) -> TablePreview | ChartError:
error_type="InvalidChart",
)
+ # Build columns list: include both x_axis and groupby
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+
+ columns = groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+ elif x_axis_config and isinstance(x_axis_config, dict):
+ col_name = x_axis_config.get("column_name")
+ if col_name and col_name not in columns:
+ columns.insert(0, col_name)
Review Comment:
**Suggestion:** When the x-axis is configured as an ad-hoc/SQL expression (a
dict without `column_name`), the table preview's `columns` list will omit the
x-axis entirely because it only looks for `column_name` instead of using the
full dict configuration, so those charts will still lose their x-axis in the
preview. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Table preview missing x-axis for ad-hoc x_axis charts.
- ⚠️ MCP table previews used by clients show incorrect columns.
- ⚠️ ASCIIPreview unaffected (already includes dict handling).
```
</details>
```suggestion
# Include dict-based x_axis configs (e.g., ad-hoc/SQL
expressions) directly
if x_axis_config not in columns:
columns.insert(0, x_axis_config)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create a chart form_data where the x-axis is an ad-hoc/SQL expression
represented as a
dict WITHOUT a `column_name` key (e.g., x_axis = {"sqlExpression":
"DATE_TRUNC('month',
order_date)"}). Put that form_data into a saved chart or a transient
form_data key.
2. Call the MCP preview tool endpoint which runs get_chart_preview ->
_get_chart_preview_internal at
superset/mcp_service/chart/tool/get_chart_preview.py. The
TablePreview path is exercised when request.format == "table", which uses
TablePreviewStrategy.generate (function context around lines 173-201).
3. Inside TablePreviewStrategy.generate, the code that builds the columns
list checks the
x_axis dict branch at lines 196-199
(superset/mcp_service/chart/tool/get_chart_preview.py:196-199). Because the
dict lacks
"column_name", col_name is None and the code does not insert anything into
`columns`.
4. QueryContextFactory.create is then called with queries containing
"columns": columns
(see factory.create call around lines 201-214). ChartDataCommand.run
(superset/commands/chart/data/get_data_command.py:34-73) executes the query
payload built
from this QueryContext; since the ad-hoc x_axis expression was not added to
`columns`, the
resulting query payload lacks the x-axis dimension and the returned table
preview will be
missing the x-axis column.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_preview.py
**Line:** 197:199
**Comment:**
*Logic Error: When the x-axis is configured as an ad-hoc/SQL expression
(a dict without `column_name`), the table preview's `columns` list will omit
the x-axis entirely because it only looks for `column_name` instead of using
the full dict configuration, so those charts will still lose their x-axis in
the preview.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/chart/preview_utils.py:
##########
@@ -64,12 +64,27 @@ def generate_preview_from_form_data(
# Create query context from form data using factory
from superset.common.query_context_factory import QueryContextFactory
+ # Build columns list: include x_axis and groupby for XY charts,
+ # fall back to form_data "columns" for table charts
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
Review Comment:
**Suggestion:** Calling `.copy()` on `groupby` or `columns` assumes they are
always lists, but if form data ever provides a single string or other scalar (a
pattern used elsewhere in Superset), this will raise an AttributeError at
runtime; normalizing these values to lists before copying makes the code robust
to both single and multiple values. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP preview generation crashes on scalar groupby inputs.
- ⚠️ Bar/line chart previews unavailable for affected requests.
- ⚠️ Affects generate_preview_from_form_data preview endpoint logic.
```
</details>
```suggestion
groupby_columns = form_data.get("groupby") or []
if not isinstance(groupby_columns, list):
groupby_columns = [groupby_columns]
raw_columns = form_data.get("columns") or []
if not isinstance(raw_columns, list):
raw_columns = [raw_columns]
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start Superset with the MCP service and invoke preview generation logic
which calls
generate_preview_from_form_data() defined in
superset/mcp_service/chart/preview_utils.py
(function block beginning at line ~54 in that file).
2. Send form_data where "groupby" is provided as a single string (e.g.,
{"groupby":
"year"}) and "columns" is omitted or empty. The code path at
preview_utils.py:69-73
executes.
3. At preview_utils.py:73 the code executes groupby_columns.copy(), but
groupby_columns is
a str not a list, so Python raises AttributeError: 'str' object has no
attribute 'copy'.
4. The AttributeError bubbles out, causing preview generation to fail and
ChartError
fallback to be skipped; reproduce by calling the MCP preview flow (per PR
testing
instructions) with a single-string groupby value.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/preview_utils.py
**Line:** 70:71
**Comment:**
*Type Error: Calling `.copy()` on `groupby` or `columns` assumes they
are always lists, but if form data ever provides a single string or other
scalar (a pattern used elsewhere in Superset), this will raise an
AttributeError at runtime; normalizing these values to lists before copying
makes the code robust to both single and multiple values.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/unit_tests/mcp_service/chart/test_preview_utils.py:
##########
@@ -0,0 +1,146 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Tests for preview_utils query context column building.
+"""
+
+
+class TestPreviewUtilsColumnBuilding:
+ """Tests for x_axis + groupby column building in
generate_preview_from_form_data.
+
+ The function must build the columns list from both x_axis and groupby for
+ XY charts, and fall back to form_data["columns"] for table charts.
+ """
+
+ def test_xy_chart_uses_x_axis_and_groupby(self):
+ """Test XY chart form_data builds columns from x_axis + groupby."""
+ form_data = {
+ "x_axis": "territory",
+ "groupby": ["year"],
+ "metrics": [{"label": "SUM(sales)"}],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+ elif x_axis_config and isinstance(x_axis_config, dict):
+ col_name = x_axis_config.get("column_name")
+ if col_name and col_name not in columns:
+ columns.insert(0, col_name)
+
+ assert columns == ["territory", "year"]
+
+ def test_table_chart_uses_columns_field(self):
+ """Test table chart form_data uses 'columns' field directly."""
+ form_data = {
+ "columns": ["name", "region", "sales"],
+ "metrics": [],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+
+ assert columns == ["name", "region", "sales"]
+
+ def test_xy_chart_x_axis_dict_format(self):
+ """Test XY chart with x_axis as dict (column_name key)."""
+ form_data = {
+ "x_axis": {"column_name": "order_date"},
+ "groupby": ["product_type"],
+ "metrics": [{"label": "SUM(revenue)"}],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+ elif x_axis_config and isinstance(x_axis_config, dict):
+ col_name = x_axis_config.get("column_name")
+ if col_name and col_name not in columns:
+ columns.insert(0, col_name)
+
+ assert columns == ["order_date", "product_type"]
+
+ def test_no_x_axis_no_columns_uses_groupby(self):
+ """Test fallback to groupby when no x_axis and no columns."""
+ form_data = {
+ "groupby": ["category"],
+ "metrics": [{"label": "COUNT(*)"}],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+
+ assert columns == ["category"]
+
+ def test_empty_form_data_returns_empty_columns(self):
+ """Test empty form_data returns empty columns list."""
+ form_data: dict = {
+ "metrics": [{"label": "COUNT(*)"}],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+
+ assert columns == []
+
+ def test_x_axis_not_duplicated_when_in_groupby(self):
+ """Test x_axis is not added if already present in groupby."""
+ form_data = {
+ "x_axis": "territory",
+ "groupby": ["territory", "year"],
+ "metrics": [{"label": "SUM(sales)"}],
+ }
+
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
Review Comment:
**Suggestion:** The logic that builds the `columns` list from `form_data` is
duplicated in every test method, which is a maintainability bug: if the
column-building rules change, some tests can silently become inconsistent with
others or with the production logic. Extracting this into a single helper
method used by all tests removes the duplication and ensures there is exactly
one place to update the behavior. [possible bug]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Tests duplication in
tests/unit_tests/mcp_service/chart/test_preview_utils.py
- ❌ CI may miss regressions in preview column logic
- ⚠️ Developer friction updating column-building rules
- ⚠️ Harder to ensure parity with production function
```
</details>
```suggestion
def _build_columns(self, form_data: dict) -> list:
"""Helper to build columns from form_data in a single place."""
x_axis_config = form_data.get("x_axis")
groupby_columns = form_data.get("groupby", [])
raw_columns = form_data.get("columns", [])
columns = raw_columns.copy() if raw_columns else
groupby_columns.copy()
if x_axis_config and isinstance(x_axis_config, str):
if x_axis_config not in columns:
columns.insert(0, x_axis_config)
elif x_axis_config and isinstance(x_axis_config, dict):
col_name = x_axis_config.get("column_name")
if col_name and col_name not in columns:
columns.insert(0, col_name)
return columns
def test_xy_chart_uses_x_axis_and_groupby(self):
"""Test XY chart form_data builds columns from x_axis + groupby."""
form_data = {
"x_axis": "territory",
"groupby": ["year"],
"metrics": [{"label": "SUM(sales)"}],
}
columns = self._build_columns(form_data)
assert columns == ["territory", "year"]
def test_table_chart_uses_columns_field(self):
"""Test table chart form_data uses 'columns' field directly."""
form_data = {
"columns": ["name", "region", "sales"],
"metrics": [],
}
columns = self._build_columns(form_data)
assert columns == ["name", "region", "sales"]
def test_xy_chart_x_axis_dict_format(self):
"""Test XY chart with x_axis as dict (column_name key)."""
form_data = {
"x_axis": {"column_name": "order_date"},
"groupby": ["product_type"],
"metrics": [{"label": "SUM(revenue)"}],
}
columns = self._build_columns(form_data)
assert columns == ["order_date", "product_type"]
def test_no_x_axis_no_columns_uses_groupby(self):
"""Test fallback to groupby when no x_axis and no columns."""
form_data = {
"groupby": ["category"],
"metrics": [{"label": "COUNT(*)"}],
}
columns = self._build_columns(form_data)
assert columns == ["category"]
def test_empty_form_data_returns_empty_columns(self):
"""Test empty form_data returns empty columns list."""
form_data: dict = {
"metrics": [{"label": "COUNT(*)"}],
}
columns = self._build_columns(form_data)
assert columns == []
def test_x_axis_not_duplicated_when_in_groupby(self):
"""Test x_axis is not added if already present in groupby."""
form_data = {
"x_axis": "territory",
"groupby": ["territory", "year"],
"metrics": [{"label": "SUM(sales)"}],
}
columns = self._build_columns(form_data)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open the test file
tests/unit_tests/mcp_service/chart/test_preview_utils.py and inspect
lines 23-146 (the TestPreviewUtilsColumnBuilding class). You will see the
columns-building
logic duplicated in every test method (lines 38-145 contain repeated blocks
that compute
`columns` from form_data).
2. Run the specific test class: pytest
tests/unit_tests/mcp_service/chart/test_preview_utils.py::TestPreviewUtilsColumnBuilding.
The tests exercise only the inline, duplicated logic inside this file and do
not import or
call the production helper.
3. Now change the production implementation that builds the query-context
columns — e.g.,
edit src/mcp_service/chart/preview_utils.py (function
generate_preview_from_form_data) to
alter column-building behavior (add/remove x_axis handling). Because the
unit tests in
tests/unit_tests/mcp_service/chart/test_preview_utils.py reimplement the
logic themselves,
running pytest again will NOT detect the divergence between production code
and tests.
4. Observe CI or local test runs: tests in
tests/unit_tests/mcp_service/chart/test_preview_utils.py still pass (they
validate the
duplicated logic, not the production function), demonstrating that
duplicated test logic
can silently allow regressions. Extracting the logic into a single helper
used by tests
would ensure tests validate the actual production behavior.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/chart/test_preview_utils.py
**Line:** 30:144
**Comment:**
*Possible Bug: The logic that builds the `columns` list from
`form_data` is duplicated in every test method, which is a maintainability bug:
if the column-building rules change, some tests can silently become
inconsistent with others or with the production logic. Extracting this into a
single helper method used by all tests removes the duplication and ensures
there is exactly one place to update the behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -279,6 +292,19 @@ def generate(self) -> VegaLitePreview | ChartError:
utils_json.loads(self.chart.params) if self.chart.params
else {}
)
+ # Build columns list: include both x_axis and groupby
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+
+ columns = groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+ elif x_axis_config and isinstance(x_axis_config, dict):
+ col_name = x_axis_config.get("column_name")
+ if col_name and col_name not in columns:
+ columns.insert(0, col_name)
Review Comment:
**Suggestion:** The Vega-Lite preview has the same issue for dict-based
x-axis configs: if the x-axis is an ad-hoc/SQL expression without a
`column_name`, it is never added to the `columns` list, so the resulting query
data still lacks the x-axis dimension for those charts. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Vega-Lite preview missing x-axis for ad-hoc x_axis charts.
- ⚠️ Interactive previews show incorrect or collapsed series.
- ⚠️ Downstream visualization logic builds specs without x-axis.
```
</details>
```suggestion
# Include dict-based x_axis configs (e.g., ad-hoc/SQL
expressions) directly
if x_axis_config not in columns:
columns.insert(0, x_axis_config)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Prepare chart form_data where x_axis is a dict ad-hoc expression without
`column_name`
(e.g., x_axis = {"sqlExpression": "EXTRACT(YEAR FROM created_at)"}).
2. Request a Vega-Lite preview via MCP (get_chart_preview ->
_get_chart_preview_internal).
When the preview format is "vega_lite", VegaLitePreviewStrategy.generate is
executed
(context around lines 282-316).
3. VegaLitePreviewStrategy.generate builds the `columns` list and reaches
the dict branch
at lines 303-306
(superset/mcp_service/chart/tool/get_chart_preview.py:303-306). Because
there is no "column_name" key, col_name is falsy and the ad-hoc x_axis is
not inserted
into `columns`.
4. QueryContextFactory.create is invoked with queries including those
`columns`
(factory.create call around lines 308-316). ChartDataCommand.run
(superset/commands/chart/data/get_data_command.py:34-73) runs the query
without the x-axis
expression, so the Vega-Lite preview data lacks the x-axis field and the
visualization
spec generated from returned data is missing that dimension.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_preview.py
**Line:** 304:306
**Comment:**
*Logic Error: The Vega-Lite preview has the same issue for dict-based
x-axis configs: if the x-axis is an ad-hoc/SQL expression without a
`column_name`, it is never added to the `columns` list, so the resulting query
data still lacks the x-axis dimension for those charts.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/mcp_service/chart/preview_utils.py:
##########
@@ -64,12 +64,27 @@ def generate_preview_from_form_data(
# Create query context from form data using factory
from superset.common.query_context_factory import QueryContextFactory
+ # Build columns list: include x_axis and groupby for XY charts,
+ # fall back to form_data "columns" for table charts
+ x_axis_config = form_data.get("x_axis")
+ groupby_columns = form_data.get("groupby", [])
+ raw_columns = form_data.get("columns", [])
+
+ columns = raw_columns.copy() if raw_columns else groupby_columns.copy()
+ if x_axis_config and isinstance(x_axis_config, str):
+ if x_axis_config not in columns:
+ columns.insert(0, x_axis_config)
+ elif x_axis_config and isinstance(x_axis_config, dict):
+ col_name = x_axis_config.get("column_name")
Review Comment:
**Suggestion:** For dict-style x_axis configs (e.g., adhoc or
expression-based x-axes), the code only reads `column_name`, so when the config
instead uses `label` or `sqlExpression` (as is common for adhoc columns) the
x-axis is silently omitted from the `columns` list and the query still misses
the x-axis dimension. [possible bug]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Adhoc/expression x_axis omitted from MCP previews.
- ⚠️ Vega-Lite and table previews lose x-axis dimension.
- ⚠️ Affects generate_preview_from_form_data correctness.
```
</details>
```suggestion
col_name = (
x_axis_config.get("column_name")
or x_axis_config.get("label")
or x_axis_config.get("sqlExpression")
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Create form_data where x_axis is an adhoc/expression dict, for example:
{"x_axis": {"sqlExpression": "DATE_TRUNC('year', created_at)"} ,
"groupby": ["genre"],
"columns": []}.
Call generate_preview_from_form_data() in
superset/mcp_service/chart/preview_utils.py
(function around line 54).
2. Execution reaches preview_utils.py:77-80. The code only reads
x_axis_config.get("column_name") and ignores "sqlExpression" or "label".
3. Because "column_name" is absent, col_name is None and no insertion into
columns occurs,
so the x-axis SQL expression isn't included in the query context.
4. The generated query and preview (table/Vega-Lite) miss the x-axis
dimension; this
reproduces the original class of bugs the PR fixes when x_axis is provided
as an adhoc
expression.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/preview_utils.py
**Line:** 78:78
**Comment:**
*Possible Bug: For dict-style x_axis configs (e.g., adhoc or
expression-based x-axes), the code only reads `column_name`, so when the config
instead uses `label` or `sqlExpression` (as is common for adhoc columns) the
x-axis is silently omitted from the `columns` list and the query still misses
the x-axis dimension.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]