codeant-ai-for-open-source[bot] commented on code in PR #37185:
URL: https://github.com/apache/superset/pull/37185#discussion_r2710798126
##########
superset/mcp_service/chart/chart_utils.py:
##########
@@ -34,6 +37,113 @@
from superset.mcp_service.utils.url_utils import get_superset_base_url
from superset.utils import json
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DatasetValidationResult:
+ """Result of dataset accessibility validation."""
+
+ is_valid: bool
+ dataset_id: int | None
+ dataset_name: str | None
+ warnings: list[str]
+ error: str | None = None
+
+
+def validate_chart_dataset(
+ chart: Any,
+ check_access: bool = True,
+) -> DatasetValidationResult:
+ """
+ Validate that a chart's dataset exists and is accessible.
+
+ This shared utility should be called by MCP tools after creating or
retrieving
+ charts to detect issues like missing or deleted datasets early.
+
+ Args:
+ chart: A chart-like object with datasource_id, datasource_type
attributes
+ check_access: Whether to also check user permissions (default True)
+
+ Returns:
+ DatasetValidationResult with validation status and any warnings
+ """
+ from superset.daos.dataset import DatasetDAO
+ from superset.mcp_service.auth import has_dataset_access
+
+ warnings: list[str] = []
+ datasource_id = getattr(chart, "datasource_id", None)
+
+ # Check if chart has a datasource reference
+ if datasource_id is None:
+ return DatasetValidationResult(
+ is_valid=False,
+ dataset_id=None,
+ dataset_name=None,
+ warnings=[],
+ error="Chart has no dataset reference (datasource_id is None)",
+ )
+
+ # Try to look up the dataset
+ try:
+ dataset = DatasetDAO.find_by_id(datasource_id)
+
+ if dataset is None:
+ return DatasetValidationResult(
+ is_valid=False,
+ dataset_id=datasource_id,
+ dataset_name=None,
+ warnings=[],
+ error=(
+ f"Dataset (ID: {datasource_id}) has been deleted or does
not "
+ f"exist. The chart will not render correctly. "
+ f"Consider updating the chart to use a different dataset."
+ ),
+ )
+
+ dataset_name = getattr(dataset, "table_name", None) or getattr(
+ dataset, "name", None
+ )
+
+ # Check if it's a virtual dataset (SQL Lab query)
+ is_virtual = bool(getattr(dataset, "sql", None))
+ if is_virtual:
+ warnings.append(
+ f"This chart uses a virtual dataset (SQL-based). "
+ f"If the dataset '{dataset_name}' is deleted, this chart will
break."
+ )
+
+ # Check access permissions if requested
+ if check_access and not has_dataset_access(dataset):
+ return DatasetValidationResult(
+ is_valid=False,
+ dataset_id=datasource_id,
+ dataset_name=dataset_name,
+ warnings=warnings,
+ error=(
+ f"Access denied to dataset '{dataset_name}' (ID:
{datasource_id}). "
+ f"You do not have permission to view this dataset."
+ ),
+ )
+
+ return DatasetValidationResult(
+ is_valid=True,
+ dataset_id=datasource_id,
+ dataset_name=dataset_name,
+ warnings=warnings,
+ error=None,
+ )
+
+ except (AttributeError, ValueError, RuntimeError) as e:
+ logger.warning("Error validating chart dataset %s: %s", datasource_id,
e)
Review Comment:
**Suggestion:** The exception handler at the end of `validate_chart_dataset`
only catches (AttributeError, ValueError, RuntimeError), which will let many
DAO/DB exceptions (e.g., SQLAlchemy errors, connection errors) propagate and
crash callers. Broaden the catch to a generic Exception, log the full exception
including traceback, and preserve the (possibly normalized) `datasource_id` in
the returned `DatasetValidationResult` so callers get a safe, consistent
failure result instead of an uncaught exception. [resource leak]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ MCP endpoints may throw uncaught DB exceptions.
- ⚠️ validate_chart_dataset fails to return safe error payloads.
- ⚠️ Debugging lacks full exception traceback in logs.
```
</details>
```suggestion
except Exception as e:
# Catch any unexpected errors from DAO/security checks and fail
securely.
logger.exception("Error validating chart dataset %s", datasource_id)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Simulate a backend/database error on dataset lookup by making
DatasetDAO.find_by_id
raise a SQL/DB error.
The lookup occurs in superset/mcp_service/chart/chart_utils.py at line
89: "dataset =
DatasetDAO.find_by_id(datasource_id)".
2. Call validate_chart_dataset(...) (function defined starting at PR hunk
line ~54).
Because the except clause only catches AttributeError, ValueError,
RuntimeError (lines
137-145), a SQLAlchemy/DB exception will not be caught here.
3. The uncaught DB exception propagates out of validate_chart_dataset and up
to the
caller. Any MCP caller (the MCP tools that invoke this utility after chart
creation or on
preview/data requests, as introduced by the PR) will receive the uncaught
exception
instead of a safe DatasetValidationResult.
4. Observed result: HTTP 500 or unhandled exception in MCP workflow instead
of a
controlled DatasetValidationResult with an error message. The auth helper
used inside
validate_chart_dataset is superset/mcp_service/auth.py:has_dataset_access
(lines 97-127),
which itself returns False on exception, but DAO/database errors at lookup
time are not
currently handled by the specific except clause.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/chart_utils.py
**Line:** 137:138
**Comment:**
*Resource Leak: The exception handler at the end of
`validate_chart_dataset` only catches (AttributeError, ValueError,
RuntimeError), which will let many DAO/DB exceptions (e.g., SQLAlchemy errors,
connection errors) propagate and crash callers. Broaden the catch to a generic
Exception, log the full exception including traceback, and preserve the
(possibly normalized) `datasource_id` in the returned `DatasetValidationResult`
so callers get a safe, consistent failure result instead of an uncaught
exception.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]