codeant-ai-for-open-source[bot] commented on code in PR #39922: URL: https://github.com/apache/superset/pull/39922#discussion_r3398655586
########## superset/mcp_service/chart/plugins/pivot_table.py: ########## @@ -0,0 +1,158 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Pivot table chart type plugin.""" + +from __future__ import annotations + +from typing import Any + +from superset.mcp_service.chart.chart_utils import ( + _pivot_table_what, + _summarize_filters, + map_pivot_table_config, +) +from superset.mcp_service.chart.plugin import BaseChartPlugin +from superset.mcp_service.chart.schemas import ColumnRef, PivotTableChartConfig +from superset.mcp_service.chart.validation.dataset_validator import DatasetValidator +from superset.mcp_service.common.error_schemas import ChartGenerationError + + +class PivotTableChartPlugin(BaseChartPlugin): + """Plugin for pivot_table chart type.""" + + chart_type = "pivot_table" + display_name = "Pivot Table" + native_viz_types = { + "pivot_table_v2": "Pivot Table", + } + + def pre_validate( + self, + config: dict[str, Any], + ) -> ChartGenerationError | None: + missing_fields = [] + + if not config.get("rows"): + missing_fields.append("'rows' (row grouping columns)") Review Comment: **Suggestion:** The pre-validation only checks `rows`, but the schema explicitly allows aliases like `groupby` and `dimension`. Requests using those valid aliases are rejected before Pydantic can normalize them, causing false "missing required fields" errors for otherwise valid payloads. Make pre-validation honor the same aliases as the schema. [api mismatch] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Pivot table requests using groupby aliases rejected early. - ⚠️ Valid configs fail before Pydantic schema normalization. - ⚠️ LLM clients following schema docs see spurious errors. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Invoke the MCP `generate_chart` tool (`superset/mcp_service/chart/tool/generate_chart.py:220-279`) with a pivot table request whose `config` uses a valid alias for rows, e.g. `{\"chart_type\": \"pivot_table\", \"groupby\": [{\"name\": \"region\"}], \"metrics\": [{\"name\": \"revenue\", \"aggregate\": \"SUM\"}]}` and no explicit `rows` key. 2. The request flows into `SchemaValidator.validate_request()` at `superset/mcp_service/chart/validation/schema_validator.py:44-57`, which calls `_pre_validate(request_data)` at `schema_validator.py:64-79`; `_pre_validate` extracts `config` and `chart_type` and dispatches to `_pre_validate_chart_type(chart_type, config)` at `schema_validator.py:27-31`. 3. `_pre_validate_chart_type` loads the `PivotTableChartPlugin` from the registry and calls `plugin.pre_validate(config)` at `schema_validator.py:59-77`; inside `PivotTableChartPlugin.pre_validate` (`superset/mcp_service/chart/plugins/pivot_table.py:44-69`), the code checks only `config.get("rows")` at line 50 and, because the caller used `groupby`, appends `"'rows' (row grouping columns)"` to `missing_fields` and returns a `ChartGenerationError` with `error_code="MISSING_PIVOT_FIELDS"`. 4. Because pre-validation returns an error, `SchemaValidator.validate_request()` exits early at `schema_validator.py:51-52`, and Pydantic is never invoked to apply the `rows` field aliases (`AliasChoices("rows", "groupby", "dimension")` defined on `PivotTableChartConfig.rows` in `superset/mcp_service/chart/schemas.py:106-112`), so a request that would be valid per the schema is rejected with a false "missing required fields" error. ``` </details> [Fix in Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3844a3b331a541c6a00e1a6f433edeee&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) | [Fix in VSCode Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3844a3b331a541c6a00e1a6f433edeee&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/pivot_table.py **Line:** 50:51 **Comment:** *Api Mismatch: The pre-validation only checks `rows`, but the schema explicitly allows aliases like `groupby` and `dimension`. Requests using those valid aliases are rejected before Pydantic can normalize them, causing false "missing required fields" errors for otherwise valid payloads. Make pre-validation honor the same aliases as the schema. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=be9276764d5b8175ff4b6302fc3758373ab43aaec93a5bfa18b33b4a09bc8dcb&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=be9276764d5b8175ff4b6302fc3758373ab43aaec93a5bfa18b33b4a09bc8dcb&reaction=dislike'>👎</a> ########## superset/mcp_service/chart/plugins/mixed_timeseries.py: ########## @@ -0,0 +1,170 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""Mixed timeseries chart type plugin.""" + +from __future__ import annotations + +from typing import Any + +from superset.mcp_service.chart.chart_utils import ( + _mixed_timeseries_what, + _summarize_filters, + map_mixed_timeseries_config, +) +from superset.mcp_service.chart.plugin import BaseChartPlugin +from superset.mcp_service.chart.schemas import ColumnRef, MixedTimeseriesChartConfig +from superset.mcp_service.chart.validation.dataset_validator import DatasetValidator +from superset.mcp_service.common.error_schemas import ChartGenerationError + + +class MixedTimeseriesChartPlugin(BaseChartPlugin): + """Plugin for mixed_timeseries chart type.""" + + chart_type = "mixed_timeseries" + display_name = "Mixed Timeseries" + native_viz_types = { + "mixed_timeseries": "Mixed Timeseries Chart", + } + + def pre_validate( + self, + config: dict[str, Any], + ) -> ChartGenerationError | None: + missing_fields = [] + + if "x" not in config and "x_axis" not in config: + missing_fields.append("'x' (X-axis temporal column)") + if not config.get("y") and not config.get("metrics"): + missing_fields.append("'y' (primary Y-axis metrics)") + if not config.get("y_secondary") and not config.get("metrics_b"): + missing_fields.append("'y_secondary' (secondary Y-axis metrics)") + + if missing_fields: + return ChartGenerationError( + error_type="missing_mixed_timeseries_fields", + message=( + f"Mixed timeseries chart missing required fields: " + f"{', '.join(missing_fields)}" + ), + details=( + "Mixed timeseries charts require an x-axis, primary metrics, " + "and secondary metrics" + ), + suggestions=[ + "Add 'x' field: {'name': 'date_column'}", + "Add 'y' field: [{'name': 'revenue', 'aggregate': 'SUM'}]", + "Add 'y_secondary': [{'name': 'orders', 'aggregate': 'COUNT'}]", + "Optional: 'primary_kind' and 'secondary_kind' for chart types", + ], + error_code="MISSING_MIXED_TIMESERIES_FIELDS", + ) + + for field_name in ["y", "y_secondary"]: + if not isinstance(config.get(field_name, []), list): + return ChartGenerationError( + error_type=f"invalid_{field_name}_format", + message=f"'{field_name}' must be a list of metrics", + details=( + f"The '{field_name}' field must be an array of metric " + "specifications" + ), + suggestions=[ + f"Wrap in array: '{field_name}': " + "[{'name': 'col', 'aggregate': 'SUM'}]", + ], + error_code=f"INVALID_{field_name.upper()}_FORMAT", + ) + + return None + + def extract_column_refs(self, config: Any) -> list[ColumnRef]: + if not isinstance(config, MixedTimeseriesChartConfig): + return [] + refs: list[ColumnRef] = [config.x] + refs.extend(config.y) + refs.extend(config.y_secondary) + if config.group_by: + refs.extend(config.group_by) + if config.group_by_secondary: + refs.extend(config.group_by_secondary) + if config.filters: + for f in config.filters: + refs.append(ColumnRef(name=f.column)) + return refs + + def to_form_data( + self, config: Any, dataset_id: int | str | None = None + ) -> dict[str, Any]: + return map_mixed_timeseries_config(config, dataset_id=dataset_id) + + def generate_name(self, config: Any, dataset_name: str | None = None) -> str: + what = _mixed_timeseries_what(config) + context = _summarize_filters(config.filters) + return self._with_context(what, context) + + def resolve_viz_type(self, config: Any) -> str: + return "mixed_timeseries" + + def normalize_column_refs(self, config: Any, dataset_context: Any) -> Any: + config_dict = config.model_dump() + + def _norm_single(key: str) -> None: + if config_dict.get(key): + config_dict[key]["name"] = DatasetValidator._get_canonical_column_name( + config_dict[key]["name"], dataset_context + ) + + def _norm_list(key: str) -> None: + if config_dict.get(key): + for col in config_dict[key]: + if col.get("saved_metric"): + col["name"] = DatasetValidator._get_canonical_metric_name( + col["name"], dataset_context + ) + else: + col["name"] = DatasetValidator._get_canonical_column_name( + col["name"], dataset_context + ) Review Comment: **Suggestion:** Non-saved metrics are normalized as if they always have a `name`, but mixed-timeseries metrics can be SQL expressions with `name=None`. This path passes `None` into canonical column lookup, which raises and aborts normalization for the entire chart config. [null pointer] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Mixed-timeseries charts with SQL metrics skip name normalization. - ⚠️ generate_chart pipeline loses case-fixing for mixed-timeseries. - ⚠️ Normalization errors logged but silently ignored for these configs. ``` </details> <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Send a Mixed Timeseries chart request to the MCP `generate_chart` tool (`superset/mcp_service/chart/tool/generate_chart.py:220-57`) with `config.chart_type = "mixed_timeseries"` and at least one primary or secondary metric (`y` or `y_secondary`) defined as a SQL-expression `ColumnRef` (populate `sql_expression` and `label`, omit `name`) as allowed by `MixedTimeseriesChartConfig` (`superset/mcp_service/chart/schemas.py:18-59` and `ColumnRef` at `schemas.py:36-123`). 2. The request flows through `ValidationPipeline.validate_request_with_warnings()` (`superset/mcp_service/chart/validation/pipeline.py:60-80`), which, after schema and dataset validation, calls `_normalize_column_names()` (`pipeline.py:10-53`) to canonicalize column names using the dataset context. 3. `_normalize_column_names()` calls `DatasetValidator.normalize_column_names(config, request.dataset_id, dataset_context)` (`superset/mcp_service/chart/validation/dataset_validator.py:120-167`), which resolves the `"mixed_timeseries"` plugin and invokes `MixedTimeseriesChartPlugin.normalize_column_refs(config, dataset_context)` (`superset/mcp_service/chart/plugins/mixed_timeseries.py:123-150`). 4. In `normalize_column_refs`, `config.model_dump()` yields dict entries where SQL-expression metrics have `col["name"] = None` and `col["saved_metric"] = False`; the `_norm_list` helper at `mixed_timeseries.py:132-142` iterates the metrics, and for such a metric the `else` branch at lines 139-142 passes `None` to `DatasetValidator._get_canonical_column_name()`. That method (`dataset_validator.py:53-84`) calls `column_name.lower()`, raising `AttributeError`, which is caught by `_normalize_column_names()` (`pipeline.py:49-53`) and results in a warning and a fallback to the unnormalized request, so no column names in the Mixed Timeseries config are normalized. ``` </details> [Fix in Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=141bd0ac77b14fd89c9c988c7f93beba&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) | [Fix in VSCode Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=141bd0ac77b14fd89c9c988c7f93beba&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset) *(Use Cmd/Ctrl + Click for best experience)* <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/mcp_service/chart/plugins/mixed_timeseries.py **Line:** 139:142 **Comment:** *Null Pointer: Non-saved metrics are normalized as if they always have a `name`, but mixed-timeseries metrics can be SQL expressions with `name=None`. This path passes `None` into canonical column lookup, which raises and aborts normalization for the entire chart config. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=25904df3b7502e840a901dcd827b9d996d09c591ff6b831f611597fc7910db30&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39922&comment_hash=25904df3b7502e840a901dcd827b9d996d09c591ff6b831f611597fc7910db30&reaction=dislike'>👎</a> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
