codeant-ai-for-open-source[bot] commented on code in PR #40935:
URL: https://github.com/apache/superset/pull/40935#discussion_r3389489799


##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -444,6 +444,16 @@ class CreateVirtualDatasetRequest(BaseModel):
         None,
         description="Human-readable description of the dataset (optional).",
     )
+    metrics: List[Dict[str, Any]] | None = Field(
+        None,
+        description="Optional list of saved metrics to create. Each dictionary 
"
+        "must have 'metric_name' and 'expression'.",
+    )
+    calculated_columns: List[Dict[str, Any]] | None = Field(
+        None,
+        description="Optional list of calculated columns to create. Each 
dictionary "
+        "must have 'column_name' and 'expression'.",
+    )

Review Comment:
   **Suggestion:** `metrics` and `calculated_columns` accept arbitrary 
dictionaries without required-key validation, but downstream update validation 
assumes keys like `metric_name`/`column_name` exist and can raise uncaught 
`KeyError`, causing a 500. Add explicit typed item schemas or validators that 
enforce required fields before command execution. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP virtual dataset creation can 500 on bad metrics.
   - ⚠️ LLM-generated payloads may intermittently hit uncaught KeyError.
   - ⚠️ Clients receive no structured field-level validation feedback.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Invoke the MCP tool `create_virtual_dataset` (implemented in
   `superset/mcp_service/dataset/tool/create_virtual_dataset.py:4-82`) via the 
MCP server
   with a `CreateVirtualDatasetRequest` payload.
   
   2. Construct the request so that `request.metrics` contains at least one 
dict missing
   `metric_name`, e.g. `metrics=[{"expression": "SUM(col1)"}]`, while other 
required fields
   (`database_id`, `sql`, `dataset_name`) are valid (schema at
   `superset/mcp_service/dataset/schemas.py:14-57`).
   
   3. Inside `create_virtual_dataset`, after 
`CreateDatasetCommand(properties).run()`
   succeeds (see `superset/commands/dataset/create.py:42-52`), the code at
   `create_virtual_dataset.py:50-60` calls `UpdateDatasetCommand(dataset.id, 
{"metrics":
   request.metrics}).run()`.
   
   4. `UpdateDatasetCommand.run()` 
(`superset/commands/dataset/update.py:83-97`) calls
   `self.validate()`, which in `_validate_semantics()` (`update.py:214-224`) 
calls
   `_validate_metrics(metrics, exceptions)`; `_validate_metrics` calls
   `_get_duplicates(metrics, "metric_name")` (`update.py:26-30, 86-93`), which 
evaluates
   `[item[key] for item in data]` and raises a `KeyError` because `metric_name` 
is missing.
   This `KeyError` is not caught by the transaction decorator (which only 
handles
   `SQLAlchemyError`/`ValueError`) nor by the specific
   `DatasetInvalidError`/`DatasetCreateFailedError`/`DatasetUpdateFailedError` 
handlers in
   `create_virtual_dataset.py:83-105`, so it falls through to the generic 
`except Exception`
   at `create_virtual_dataset.py:106-111` and is re-raised, causing the MCP 
tool call to fail
   with an unhandled server error instead of a structured validation error.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=38b06c53d245440998004a3247cf9cbd&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=38b06c53d245440998004a3247cf9cbd&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/schemas.py
   **Line:** 447:456
   **Comment:**
        *Api Mismatch: `metrics` and `calculated_columns` accept arbitrary 
dictionaries without required-key validation, but downstream update validation 
assumes keys like `metric_name`/`column_name` exist and can raise uncaught 
`KeyError`, causing a 500. Add explicit typed item schemas or validators that 
enforce required fields before command execution.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=cfb28253c76c8832a5a3711ebc2cd4d1470302077ce2bdae78f5b4d3dcb3afb3&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=cfb28253c76c8832a5a3711ebc2cd4d1470302077ce2bdae78f5b4d3dcb3afb3&reaction=dislike'>👎</a>



##########
superset/mcp_service/dataset/tool/create_virtual_dataset.py:
##########
@@ -85,6 +86,18 @@ async def create_virtual_dataset(
 
             dataset = CreateDatasetCommand(properties).run()
 
+            if request.metrics or request.calculated_columns:
+                from superset.commands.dataset.update import 
UpdateDatasetCommand
+                
+                update_props: dict[str, Any] = {}
+                if request.metrics:
+                    update_props["metrics"] = request.metrics

Review Comment:
   **Suggestion:** Passing only new `metrics` to `UpdateDatasetCommand` can 
remove pre-existing dataset metrics, since update semantics replace the metric 
set rather than append to it. Include existing metrics in the update payload 
(plus new ones) or use an append-specific path. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Auto-generated default metrics silently removed from new datasets.
   - ⚠️ Any pre-existing saved metrics are fully replaced.
   - ⚠️ MCP clients cannot rely on default metric availability.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Use the MCP `create_virtual_dataset` tool
   (`superset/mcp_service/dataset/tool/create_virtual_dataset.py:4-82`) with a
   `CreateVirtualDatasetRequest` that produces a valid dataset (so 
`CreateDatasetCommand`
   succeeds) and provide `metrics` containing only the new logical metrics you 
want to add,
   e.g. `[{"metric_name": "m1", "expression": "SUM(col1)"}]`, while the 
database layer is
   configured to auto-populate default metrics via `SqlaTable.fetch_metadata()`
   (`superset/connectors/sqla/models.py:11-28, 31-43, 55-15`).
   
   2. After `CreateDatasetCommand(properties).run()`
   (`superset/commands/dataset/create.py:42-52`) and 
`dataset.fetch_metadata()`, the dataset
   has an initial set of metrics (derived from `database.get_metrics(...)` and
   `add_missing_metrics`, `models.py:18-27, 31-15`), e.g. default `count` 
metrics.
   
   3. Because `request.metrics` is truthy, `create_virtual_dataset` executes 
the block at
   `create_virtual_dataset.py:50-60`, building `update_props["metrics"] = 
request.metrics`
   (containing only the new metric definitions) and calling 
`UpdateDatasetCommand(dataset.id,
   update_props).run()`.
   
   4. Inside `UpdateDatasetCommand.run()` 
(`superset/commands/dataset/update.py:83-97`),
   `DatasetDAO.update(self._model, attributes=self._properties)` is invoked;
   `DatasetDAO.update` (`superset/daos/dataset.py:20-32`) sees `"metrics"` and 
calls
   `update_metrics(model, attributes.pop("metrics"))`. `update_metrics`
   (`dataset.py:114-158`) bulk-inserts metrics without `id` and then deletes 
all existing
   metrics whose ids are not present in `property_metrics_by_id` (empty for 
new-only
   metrics), thereby removing any pre-existing metrics (including 
auto-populated ones), so
   the final dataset retains only the newly submitted `metrics` list.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=05cf9400f6fe495db712083e9180f31e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=05cf9400f6fe495db712083e9180f31e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/create_virtual_dataset.py
   **Line:** 93:94
   **Comment:**
        *Logic Error: Passing only new `metrics` to `UpdateDatasetCommand` can 
remove pre-existing dataset metrics, since update semantics replace the metric 
set rather than append to it. Include existing metrics in the update payload 
(plus new ones) or use an append-specific path.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=922b7dea5def2133e6a91942ece82cb9c8f2da726460665f9fce223edf6ce954&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=922b7dea5def2133e6a91942ece82cb9c8f2da726460665f9fce223edf6ce954&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to