codeant-ai-for-open-source[bot] commented on code in PR #40935:
URL: https://github.com/apache/superset/pull/40935#discussion_r3398312067


##########
superset/mcp_service/dataset/tool/create_virtual_dataset.py:
##########
@@ -85,6 +86,37 @@ async def create_virtual_dataset(
 
             dataset = CreateDatasetCommand(properties).run()
 
+            if request.metrics or request.calculated_columns:
+                from superset.commands.dataset.update import 
UpdateDatasetCommand
+
+                update_props: dict[str, Any] = {}
+                if request.metrics:
+                    # Merge existing metrics with new ones
+                    existing_metrics = (
+                        [m.to_dict() for m in dataset.metrics]
+                        if getattr(dataset, "metrics", None)
+                        else []
+                    )
+                    update_props["metrics"] = existing_metrics + [
+                        m.model_dump(exclude_none=True) for m in 
request.metrics
+                    ]
+                if request.calculated_columns:
+                    # Merge existing columns with new ones
+                    existing_cols = (
+                        [c.to_dict() for c in dataset.columns]
+                        if getattr(dataset, "columns", None)
+                        else []
+                    )
+                    update_props["columns"] = existing_cols + [
+                        c.model_dump(exclude_none=True)
+                        for c in request.calculated_columns
+                    ]
+
+                with event_logger.log_context(
+                    action="mcp.create_virtual_dataset.update"
+                ):
+                    dataset = UpdateDatasetCommand(dataset.id, 
update_props).run()

Review Comment:
   **Suggestion:** This introduces a two-step create-then-update flow without 
compensating cleanup, so if metric/column update fails, the dataset created in 
the first step is still persisted while the API returns an error. That leaves 
orphan/partial datasets and makes retries fail with dataset-name uniqueness 
conflicts. Make the operation atomic by wrapping both steps in one transaction 
boundary, or explicitly delete the newly created dataset when the update step 
raises. [incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP virtual dataset tool reports failure but dataset actually created.
   - ⚠️ Subsequent retries with same dataset_name fail due uniqueness.
   - ⚠️ Leaves untracked datasets users did not intend to keep.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the MCP tool function `create_virtual_dataset` defined at
   `superset/mcp_service/dataset/tool/create_virtual_dataset.py:43` with a
   `CreateVirtualDatasetRequest` that includes an existing `database_id`, a new
   `dataset_name`, valid `sql`, and a `metrics` list where two items share the 
same
   `"metric_name"` (triggering duplicate-metric validation).
   
   2. Inside the tool, the code at `create_virtual_dataset.py:75-87` builds 
`properties` and
   calls `CreateDatasetCommand(properties).run()`, which validates uniqueness 
and permissions
   and then persists the dataset within a transaction
   (`superset/commands/dataset/create.py:46-52`), committing the new dataset row
   successfully.
   
   3. Because `request.metrics` is non-empty, the tool then constructs 
`update_props` and
   calls `UpdateDatasetCommand(dataset.id, update_props).run()` at
   `create_virtual_dataset.py:89-118`; during `UpdateDatasetCommand.validate()` 
the
   `_validate_metrics` method (`superset/commands/dataset/update.py:14-31`) 
detects duplicate
   `"metric_name"` values, appends `DatasetMetricsDuplicateValidationError`, 
and raises
   `DatasetInvalidError`.
   
   4. This `DatasetInvalidError` from the update step is caught by the `except
   DatasetInvalidError` handler at `create_virtual_dataset.py:141-152`, which 
returns a
   `CreateVirtualDatasetResponse` with `id=None`, `error=str(messages)`, and no 
cleanup of
   the already-committed dataset, leaving an orphan dataset in the database; a 
later retry
   with the same `dataset_name` re-enters `CreateDatasetCommand.validate()`
   (`create.py:69-77`), which now fails `DatasetDAO.validate_uniqueness` and 
raises
   `DatasetExistsValidationError` wrapped in `DatasetInvalidError`, so the tool 
again returns
   an error while the original dataset remains in an unintended partial state.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=8c1dd1b4e1014882a4d28e11df1131f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=8c1dd1b4e1014882a4d28e11df1131f9&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/create_virtual_dataset.py
   **Line:** 89:118
   **Comment:**
        *Incomplete Implementation: This introduces a two-step 
create-then-update flow without compensating cleanup, so if metric/column 
update fails, the dataset created in the first step is still persisted while 
the API returns an error. That leaves orphan/partial datasets and makes retries 
fail with dataset-name uniqueness conflicts. Make the operation atomic by 
wrapping both steps in one transaction boundary, or explicitly delete the newly 
created dataset when the update step raises.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=454a858b62f3b9d7fa5b513488855653c1a9de247de18dbbb0791a6eb2f534b0&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40935&comment_hash=454a858b62f3b9d7fa5b513488855653c1a9de247de18dbbb0791a6eb2f534b0&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to