codeant-ai-for-open-source[bot] commented on code in PR #40935:
URL: https://github.com/apache/superset/pull/40935#discussion_r3398312067
##########
superset/mcp_service/dataset/tool/create_virtual_dataset.py:
##########
@@ -85,6 +86,37 @@ async def create_virtual_dataset(
dataset = CreateDatasetCommand(properties).run()
+ if request.metrics or request.calculated_columns:
+ from superset.commands.dataset.update import
UpdateDatasetCommand
+
+ update_props: dict[str, Any] = {}
+ if request.metrics:
+ # Merge existing metrics with new ones
+ existing_metrics = (
+ [m.to_dict() for m in dataset.metrics]
+ if getattr(dataset, "metrics", None)
+ else []
+ )
+ update_props["metrics"] = existing_metrics + [
+ m.model_dump(exclude_none=True) for m in
request.metrics
+ ]
+ if request.calculated_columns:
+ # Merge existing columns with new ones
+ existing_cols = (
+ [c.to_dict() for c in dataset.columns]
+ if getattr(dataset, "columns", None)
+ else []
+ )
+ update_props["columns"] = existing_cols + [
+ c.model_dump(exclude_none=True)
+ for c in request.calculated_columns
+ ]
+
+ with event_logger.log_context(
+ action="mcp.create_virtual_dataset.update"
+ ):
+ dataset = UpdateDatasetCommand(dataset.id,
update_props).run()
Review Comment:
**Suggestion:** This introduces a two-step create-then-update flow without
compensating cleanup, so if metric/column update fails, the dataset created in
the first step is still persisted while the API returns an error. That leaves
orphan/partial datasets and makes retries fail with dataset-name uniqueness
conflicts. Make the operation atomic by wrapping both steps in one transaction
boundary, or explicitly delete the newly created dataset when the update step
raises. [incomplete implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP virtual dataset tool reports failure but dataset actually created.
- ⚠️ Subsequent retries with same dataset_name fail due uniqueness.
- ⚠️ Leaves untracked datasets users did not intend to keep.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call the MCP tool function `create_virtual_dataset` defined at
`superset/mcp_service/dataset/tool/create_virtual_dataset.py:43` with a
`CreateVirtualDatasetRequest` that includes an existing `database_id`, a new
`dataset_name`, valid `sql`, and a `metrics` list where two items share the
same
`"metric_name"` (triggering duplicate-metric validation).
2. Inside the tool, the code at `create_virtual_dataset.py:75-87` builds
`properties` and
calls `CreateDatasetCommand(properties).run()`, which validates uniqueness
and permissions
and then persists the dataset within a transaction
(`superset/commands/dataset/create.py:46-52`), committing the new dataset row
successfully.
3. Because `request.metrics` is non-empty, the tool then constructs
`update_props` and
calls `UpdateDatasetCommand(dataset.id, update_props).run()` at
`create_virtual_dataset.py:89-118`; during `UpdateDatasetCommand.validate()`
the
`_validate_metrics` method (`superset/commands/dataset/update.py:14-31`)
detects duplicate
`"metric_name"` values, appends `DatasetMetricsDuplicateValidationError`,
and raises
`DatasetInvalidError`.
4. This `DatasetInvalidError` from the update step is caught by the `except
DatasetInvalidError` handler at `create_virtual_dataset.py:141-152`, which
returns a
`CreateVirtualDatasetResponse` with `id=None`, `error=str(messages)`, and no
cleanup of
the already-committed dataset, leaving an orphan dataset in the database; a
later retry
with the same `dataset_name` re-enters `CreateDatasetCommand.validate()`
(`create.py:69-77`), which now fails `DatasetDAO.validate_uniqueness` and
raises
`DatasetExistsValidationError` wrapped in `DatasetInvalidError`, so the tool
again returns
an error while the original dataset remains in an unintended partial state.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8c1dd1b4e1014882a4d28e11df1131f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8c1dd1b4e1014882a4d28e11df1131f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dataset/tool/create_virtual_dataset.py
**Line:** 89:118
**Comment:**
*Incomplete Implementation: This introduces a two-step
create-then-update flow without compensating cleanup, so if metric/column
update fails, the dataset created in the first step is still persisted while
the API returns an error. That leaves orphan/partial datasets and makes retries
fail with dataset-name uniqueness conflicts. Make the operation atomic by
wrapping both steps in one transaction boundary, or explicitly delete the newly
created dataset when the update step raises.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=454a858b62f3b9d7fa5b513488855653c1a9de247de18dbbb0791a6eb2f534b0&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=454a858b62f3b9d7fa5b513488855653c1a9de247de18dbbb0791a6eb2f534b0&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]