codeant-ai-for-open-source[bot] commented on code in PR #39922:
URL: https://github.com/apache/superset/pull/39922#discussion_r3280434043


##########
superset/mcp_service/chart/plugins/xy.py:
##########
@@ -0,0 +1,192 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""XY chart type plugin (line, bar, area, scatter)."""
+
+from __future__ import annotations
+
+import logging
+from typing import Any
+
+from superset.mcp_service.chart.chart_utils import (
+    _xy_chart_context,
+    _xy_chart_what,
+    map_xy_config,
+)
+from superset.mcp_service.chart.plugin import BaseChartPlugin
+from superset.mcp_service.chart.schemas import ColumnRef, XYChartConfig
+from superset.mcp_service.chart.validation.dataset_validator import 
DatasetValidator
+from superset.mcp_service.chart.validation.runtime.cardinality_validator 
import (
+    CardinalityValidator,
+)
+from superset.mcp_service.chart.validation.runtime.format_validator import (
+    FormatTypeValidator,
+)
+from superset.mcp_service.common.error_schemas import ChartGenerationError
+
+logger = logging.getLogger(__name__)
+
+
+class XYChartPlugin(BaseChartPlugin):
+    """Plugin for xy chart type (line, bar, area, scatter)."""
+
+    chart_type = "xy"
+    display_name = "Line / Bar / Area / Scatter Chart"
+    native_viz_types = {
+        "echarts_timeseries_line": "Line Chart",
+        "echarts_timeseries_bar": "Bar Chart",
+        "echarts_area": "Area Chart",
+        "echarts_timeseries_scatter": "Scatter Plot",
+    }
+
+    def pre_validate(
+        self,
+        config: dict[str, Any],
+    ) -> ChartGenerationError | None:
+        # x is optional — defaults to dataset's main_dttm_col in map_xy_config
+        if "y" not in config:
+            return ChartGenerationError(
+                error_type="missing_xy_fields",
+                message="XY chart missing required field: 'y' (Y-axis 
metrics)",
+                details=(
+                    "XY charts require Y-axis (metrics) specifications. "
+                    "X-axis is optional and defaults to the dataset's primary "
+                    "datetime column when omitted."
+                ),
+                suggestions=[
+                    "Add 'y' field: [{'name': 'metric_column', 'aggregate': 
'SUM'}]",
+                    "Example: {'chart_type': 'xy', 'x': {'name': 'date'}, "
+                    "'y': [{'name': 'sales', 'aggregate': 'SUM'}]}",
+                ],
+                error_code="MISSING_XY_FIELDS",
+            )
+
+        if not isinstance(config.get("y", []), list):
+            return ChartGenerationError(
+                error_type="invalid_y_format",
+                message="Y-axis must be a list of metrics",
+                details="The 'y' field must be an array of metric 
specifications",
+                suggestions=[
+                    "Wrap Y-axis metric in array: 'y': [{'name': 'column', "
+                    "'aggregate': 'SUM'}]",
+                    "Multiple metrics supported: 'y': [metric1, metric2, ...]",
+                ],
+                error_code="INVALID_Y_FORMAT",
+            )
+
+        return None
+
+    def extract_column_refs(self, config: Any) -> list[ColumnRef]:
+        if not isinstance(config, XYChartConfig):
+            return []
+        refs: list[ColumnRef] = []
+        if config.x is not None:
+            refs.append(config.x)
+        refs.extend(config.y)
+        if config.group_by:
+            refs.extend(config.group_by)
+        if config.filters:
+            for f in config.filters:
+                refs.append(ColumnRef(name=f.column))
+        return refs
+
+    def to_form_data(
+        self, config: Any, dataset_id: int | str | None = None
+    ) -> dict[str, Any]:
+        return map_xy_config(config, dataset_id=dataset_id)
+
+    def normalize_column_refs(self, config: Any, dataset_context: Any) -> Any:
+        config_dict = config.model_dump()
+        get_canonical = DatasetValidator._get_canonical_column_name
+
+        if config_dict.get("x"):
+            config_dict["x"]["name"] = get_canonical(
+                config_dict["x"]["name"], dataset_context
+            )
+        for y_col in config_dict.get("y") or []:
+            y_col["name"] = get_canonical(y_col["name"], dataset_context)

Review Comment:
   **Suggestion:** Y-series normalization always canonicalizes `name` without 
checking `saved_metric`. If a saved metric name overlaps a physical column name 
(case-insensitive), canonicalization can convert the metric reference into a 
column identifier and produce invalid metric behavior in generated form 
data/query execution. Skip canonicalization for saved metrics in the `y` loop. 
[logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Saved-metric XY charts can mis-resolve Y-axis metrics.
   - ⚠️ update_chart tool updates may error for XY charts.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a dataset so that a saved metric name collides 
case-insensitively with a
   physical column name; these names are surfaced in 
`DatasetContext.available_columns` and
   `available_metrics` built by `DatasetValidator._get_dataset_context` in
   `superset/mcp_service/chart/validation/dataset_validator.py:197-252`.
   
   2. Create an XY chart whose config uses that saved metric on the Y-axis: the 
Pydantic
   `XYChartConfig.y` field (`superset/mcp_service/chart/schemas.py:23-41`) 
accepts
   `ColumnRef` entries with `saved_metric=True`, and such a config is passed 
into the MCP
   `update_chart` tool as `request.config` at
   `superset/mcp_service/chart/tool/update_chart.py:413-415`.
   
   3. Call `update_chart` with this XY config; inside `update_chart`, when the 
chart has a
   datasource_id, column normalization is applied via
   `DatasetValidator.normalize_column_names(parsed_config, 
chart.datasource_id)` at
   `update_chart.py:425-427`, which routes to the XY plugin's 
`normalize_column_refs`
   implementation in `superset/mcp_service/chart/plugins/xy.py:112-126`.
   
   4. In `XYChartPlugin.normalize_column_refs`, the loop at `xy.py:120-121` 
unconditionally
   canonicalizes every Y-series dict with `get_canonical`, which delegates to
   `DatasetValidator._get_canonical_column_name` 
(`dataset_validator.py:303-335`) that
   prefers columns over metrics; for a saved metric whose name overlaps a 
column, this can
   rewrite the saved metric `name` to the column's canonical name, so when 
`map_xy_config`
   later converts Y columns to metrics via `create_metric_object` 
(`chart_utils.py:657-695`),
   the Superset engine attempts to resolve a non-existent saved metric name and 
the chart
   fails or renders the wrong metric.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5935dbf413b749f885746a772a0acc15&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5935dbf413b749f885746a772a0acc15&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/plugins/xy.py
   **Line:** 120:121
   **Comment:**
        *Logic Error: Y-series normalization always canonicalizes `name` 
without checking `saved_metric`. If a saved metric name overlaps a physical 
column name (case-insensitive), canonicalization can convert the metric 
reference into a column identifier and produce invalid metric behavior in 
generated form data/query execution. Skip canonicalization for saved metrics in 
the `y` loop.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=720a8bc085c8a2d71003470a00d65e216cd4aa184818a90f9b170f5dfce1db76&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=720a8bc085c8a2d71003470a00d65e216cd4aa184818a90f9b170f5dfce1db76&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/plugins/table.py:
##########
@@ -0,0 +1,128 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Table chart type plugin."""
+
+from __future__ import annotations
+
+from typing import Any
+
+from superset.mcp_service.chart.chart_utils import (
+    _summarize_filters,
+    _table_chart_what,
+    map_table_config,
+)
+from superset.mcp_service.chart.plugin import BaseChartPlugin
+from superset.mcp_service.chart.schemas import ColumnRef, TableChartConfig
+from superset.mcp_service.chart.validation.dataset_validator import 
DatasetValidator
+from superset.mcp_service.common.error_schemas import ChartGenerationError
+
+
+class TableChartPlugin(BaseChartPlugin):
+    """Plugin for table chart type."""
+
+    chart_type = "table"
+    display_name = "Table"
+    native_viz_types = {
+        "table": "Table",
+        "ag-grid-table": "Interactive Table",
+    }
+
+    def pre_validate(
+        self,
+        config: dict[str, Any],
+    ) -> ChartGenerationError | None:
+        if "columns" not in config:
+            return ChartGenerationError(
+                error_type="missing_columns",
+                message="Table chart missing required field: columns",
+                details=(
+                    "Table charts require a 'columns' array to specify which "
+                    "columns to display"
+                ),
+                suggestions=[
+                    "Add 'columns' field with array of column specifications",
+                    "Example: 'columns': [{'name': 'product'}, {'name': 
'sales', "
+                    "'aggregate': 'SUM'}]",
+                    "Each column can have optional 'aggregate' for metrics",
+                ],
+                error_code="MISSING_COLUMNS",
+            )
+
+        if not isinstance(config.get("columns", []), list):
+            return ChartGenerationError(
+                error_type="invalid_columns_format",
+                message="Columns must be a list",
+                details="The 'columns' field must be an array of column 
specifications",
+                suggestions=[
+                    "Ensure columns is an array: 'columns': [...]",
+                    "Each column should be an object with 'name' field",
+                ],
+                error_code="INVALID_COLUMNS_FORMAT",
+            )
+
+        return None
+
+    def extract_column_refs(self, config: Any) -> list[ColumnRef]:
+        if not isinstance(config, TableChartConfig):
+            return []
+        refs: list[ColumnRef] = list(config.columns)
+        if config.filters:
+            for f in config.filters:
+                refs.append(ColumnRef(name=f.column))
+        return refs
+
+    def to_form_data(
+        self, config: Any, dataset_id: int | str | None = None
+    ) -> dict[str, Any]:
+        return map_table_config(config)
+
+    def generate_name(self, config: Any, dataset_name: str | None = None) -> 
str:
+        what = _table_chart_what(config, dataset_name)
+        context = _summarize_filters(config.filters)
+        return self._with_context(what, context)
+
+    def resolve_viz_type(self, config: Any) -> str:
+        return getattr(config, "viz_type", "table")
+
+    def normalize_column_refs(self, config: Any, dataset_context: Any) -> Any:
+        config_dict = config.model_dump()
+        get_canonical = DatasetValidator._get_canonical_column_name
+
+        for col in config_dict.get("columns") or []:
+            col["name"] = get_canonical(col["name"], dataset_context)

Review Comment:
   **Suggestion:** `normalize_column_refs` rewrites every column name, 
including saved metrics. Because canonical lookup prefers dataset columns over 
metrics, a saved metric whose name collides case-insensitively with a column 
can be rewritten to the column name and later fail metric resolution at query 
time. Skip canonicalization when `saved_metric` is true (as done in other 
plugins). [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Saved-metric table charts can fail at query time.
   - ⚠️ MCP update_chart mutations may persist invalid metric references.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create or locate a dataset with both a physical column and a saved metric 
that share
   the same name case-insensitively (dataset metrics are exposed via
   DatasetValidator._get_dataset_context in
   `superset/mcp_service/chart/validation/dataset_validator.py:197-253`, which 
stores metric
   names in `available_metrics`).
   
   2. Create a table chart whose config uses that saved metric as a column ref: 
the Pydantic
   schema `TableChartConfig.columns` (in 
`superset/mcp_service/chart/schemas.py:84-122`)
   accepts `ColumnRef` objects with `saved_metric=True`, and the chart is later 
updated via
   the MCP `update_chart` tool in 
`superset/mcp_service/chart/tool/update_chart.py:304-595`.
   
   3. Call the MCP `update_chart` tool with a request whose `config.chart_type` 
is `"table"`
   and whose `columns` include this `ColumnRef` (parsed as `request.config` at
   `update_chart.py:413-415`); inside `update_chart`, column names are 
normalized by
   `DatasetValidator.normalize_column_names` (called at 
`update_chart.py:425-427`), which
   dispatches to `TableChartPlugin.normalize_column_refs` in
   `superset/mcp_service/chart/plugins/table.py:102-110`.
   
   4. In `TableChartPlugin.normalize_column_refs`, every column dict (including 
saved
   metrics) is canonicalized via `DatasetValidator._get_canonical_column_name` 
at
   `dataset_validator.py:303-335` (loop at `table.py:106-107`), which prefers 
dataset columns
   over metrics; this can rewrite a saved metric's `name` to the colliding 
column name, so
   later when `map_table_config` builds metrics using `create_metric_object`
   (`chart_utils.py:392-487`), the saved metric reference no longer matches any 
dataset
   metric and the Superset query engine cannot resolve it via its 
`metrics_by_name` lookup,
   causing query-time failures for such table charts.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5089a1d5997a4e5faad93f5c184253ed&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5089a1d5997a4e5faad93f5c184253ed&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/plugins/table.py
   **Line:** 106:107
   **Comment:**
        *Logic Error: `normalize_column_refs` rewrites every column name, 
including saved metrics. Because canonical lookup prefers dataset columns over 
metrics, a saved metric whose name collides case-insensitively with a column 
can be rewritten to the column name and later fail metric resolution at query 
time. Skip canonicalization when `saved_metric` is true (as done in other 
plugins).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=ee185327dda48b7f0da24619ea825e7c3f1c042ea77130e64547b99b4bbfb3ca&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=ee185327dda48b7f0da24619ea825e7c3f1c042ea77130e64547b99b4bbfb3ca&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/tool/update_chart.py:
##########
@@ -196,6 +196,29 @@ def _validate_update_against_dataset(
             }
         )
 
+    # Column existence + fuzzy-match validation
+    # (mirrors generate_chart pipeline layer 2)
+    from superset.mcp_service.chart.validation.dataset_validator import 
DatasetValidator
+
+    is_col_valid, col_error = DatasetValidator.validate_against_dataset(
+        parsed_config, dataset.id
+    )
+    if not is_col_valid and col_error is not None:
+        logger.warning(
+            "update_chart column validation failed for chart %s: %s",
+            getattr(chart, "id", None),
+            col_error,
+        )
+        return GenerateChartResponse.model_validate(
+            {
+                "chart": None,
+                "error": col_error.model_dump(),
+                "success": False,
+                "schema_version": "2.0",
+                "api_version": "v1",
+            }
+        )

Review Comment:
   **Suggestion:** This adds a full dataset validation call before 
`validate_and_compile`, but `validate_and_compile` already performs the same 
dataset validation internally using the already-loaded dataset object. The 
extra call triggers redundant dataset context fetching and duplicate validation 
work on every update request, increasing latency and database load. 
[performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ update_chart validation calls DatasetValidator twice per request.
   - ⚠️ Extra dataset queries increase chart-update latency and load.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the MCP `update_chart` tool defined at
   `superset/mcp_service/chart/tool/update_chart.py:304-595` with 
`generate_preview=False`
   and a non-null `config` so that `parsed_config` is set 
(`update_chart.py:413-415`) and a
   full visualization update runs.
   
   2. After building the update payload via `_build_update_payload`
   (`update_chart.py:87-121`), `update_chart` extracts `new_form_data` and, 
before
   persisting, invokes `_validate_update_against_dataset(parsed_config, 
new_form_data,
   chart)` inside the `mcp.update_chart.validation` log context at 
`update_chart.py:450-453`.
   
   3. Inside `_validate_update_against_dataset` (`update_chart.py:167-251`), 
once the dataset
   ORM object is resolved (`dataset` from `DatasetDAO.find_by_id` at 178-180), 
the new code
   block at lines 199-205 calls 
`DatasetValidator.validate_against_dataset(parsed_config,
   dataset.id)` without passing a `dataset_context`, causing `DatasetValidator` 
to fetch
   dataset context from the database via `_get_dataset_context`
   (`superset/mcp_service/chart/validation/dataset_validator.py:197-253`) and 
run full column
   and aggregation validation once.
   
   4. Immediately afterwards, `_validate_update_against_dataset` calls
   `validate_and_compile(parsed_config, form_data, dataset, 
run_compile_check=True)` at
   `update_chart.py:222-224`, and `validate_and_compile` in
   `superset/mcp_service/chart/compile.py:19-84` builds a new `dataset_context` 
from the same
   ORM dataset and calls `DatasetValidator.validate_against_dataset(config, 
dataset.id,
   dataset_context=dataset_context)` again (compile.py:48-52), resulting in two 
complete
   Tier-1 dataset validation passes and two dataset-context constructions for 
every
   update_chart request.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5ae79ab6172b487fbf22b06c5a537514&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5ae79ab6172b487fbf22b06c5a537514&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/update_chart.py
   **Line:** 203:220
   **Comment:**
        *Performance: This adds a full dataset validation call before 
`validate_and_compile`, but `validate_and_compile` already performs the same 
dataset validation internally using the already-loaded dataset object. The 
extra call triggers redundant dataset context fetching and duplicate validation 
work on every update request, increasing latency and database load.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=5ad0c50e65d714bd4e8b3153907d69df207c189af6b7a34f733cba66135c799a&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=5ad0c50e65d714bd4e8b3153907d69df207c189af6b7a34f733cba66135c799a&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to