Re: [PR] fix(mcp): add dataset validation for chart tools [superset]

via GitHub Tue, 20 Jan 2026 19:10:56 -0800


codeant-ai-for-open-source[bot] commented on code in PR #37185:
URL: https://github.com/apache/superset/pull/37185#discussion_r2710798126



##########
superset/mcp_service/chart/chart_utils.py:
##########
@@ -34,6 +37,113 @@
 from superset.mcp_service.utils.url_utils import get_superset_base_url
 from superset.utils import json
 
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class DatasetValidationResult:
+    """Result of dataset accessibility validation."""
+
+    is_valid: bool
+    dataset_id: int | None
+    dataset_name: str | None
+    warnings: list[str]
+    error: str | None = None
+
+
+def validate_chart_dataset(
+    chart: Any,
+    check_access: bool = True,
+) -> DatasetValidationResult:
+    """
+    Validate that a chart's dataset exists and is accessible.
+
+    This shared utility should be called by MCP tools after creating or 
retrieving
+    charts to detect issues like missing or deleted datasets early.
+
+    Args:
+        chart: A chart-like object with datasource_id, datasource_type 
attributes
+        check_access: Whether to also check user permissions (default True)
+
+    Returns:
+        DatasetValidationResult with validation status and any warnings
+    """
+    from superset.daos.dataset import DatasetDAO
+    from superset.mcp_service.auth import has_dataset_access
+
+    warnings: list[str] = []
+    datasource_id = getattr(chart, "datasource_id", None)
+
+    # Check if chart has a datasource reference
+    if datasource_id is None:
+        return DatasetValidationResult(
+            is_valid=False,
+            dataset_id=None,
+            dataset_name=None,
+            warnings=[],
+            error="Chart has no dataset reference (datasource_id is None)",
+        )
+
+    # Try to look up the dataset
+    try:
+        dataset = DatasetDAO.find_by_id(datasource_id)
+
+        if dataset is None:
+            return DatasetValidationResult(
+                is_valid=False,
+                dataset_id=datasource_id,
+                dataset_name=None,
+                warnings=[],
+                error=(
+                    f"Dataset (ID: {datasource_id}) has been deleted or does 
not "
+                    f"exist. The chart will not render correctly. "
+                    f"Consider updating the chart to use a different dataset."
+                ),
+            )
+
+        dataset_name = getattr(dataset, "table_name", None) or getattr(
+            dataset, "name", None
+        )
+
+        # Check if it's a virtual dataset (SQL Lab query)
+        is_virtual = bool(getattr(dataset, "sql", None))
+        if is_virtual:
+            warnings.append(
+                f"This chart uses a virtual dataset (SQL-based). "
+                f"If the dataset '{dataset_name}' is deleted, this chart will 
break."
+            )
+
+        # Check access permissions if requested
+        if check_access and not has_dataset_access(dataset):
+            return DatasetValidationResult(
+                is_valid=False,
+                dataset_id=datasource_id,
+                dataset_name=dataset_name,
+                warnings=warnings,
+                error=(
+                    f"Access denied to dataset '{dataset_name}' (ID: 
{datasource_id}). "
+                    f"You do not have permission to view this dataset."
+                ),
+            )
+
+        return DatasetValidationResult(
+            is_valid=True,
+            dataset_id=datasource_id,
+            dataset_name=dataset_name,
+            warnings=warnings,
+            error=None,
+        )
+
+    except (AttributeError, ValueError, RuntimeError) as e:
+        logger.warning("Error validating chart dataset %s: %s", datasource_id, 
e)

Review Comment:
   **Suggestion:** The exception handler at the end of `validate_chart_dataset` 
only catches (AttributeError, ValueError, RuntimeError), which will let many 
DAO/DB exceptions (e.g., SQLAlchemy errors, connection errors) propagate and 
crash callers. Broaden the catch to a generic Exception, log the full exception 
including traceback, and preserve the (possibly normalized) `datasource_id` in 
the returned `DatasetValidationResult` so callers get a safe, consistent 
failure result instead of an uncaught exception. [resource leak]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ MCP endpoints may throw uncaught DB exceptions.
   - ⚠️ validate_chart_dataset fails to return safe error payloads.
   - ⚠️ Debugging lacks full exception traceback in logs.
   ```
   </details>
   
   ```suggestion
       except Exception as e:
           # Catch any unexpected errors from DAO/security checks and fail 
securely.
           logger.exception("Error validating chart dataset %s", datasource_id)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Simulate a backend/database error on dataset lookup by making 
DatasetDAO.find_by_id
   raise a SQL/DB error.
   
      The lookup occurs in superset/mcp_service/chart/chart_utils.py at line 
89: "dataset =
      DatasetDAO.find_by_id(datasource_id)".
   
   2. Call validate_chart_dataset(...) (function defined starting at PR hunk 
line ~54).
   Because the except clause only catches AttributeError, ValueError, 
RuntimeError (lines
   137-145), a SQLAlchemy/DB exception will not be caught here.
   
   3. The uncaught DB exception propagates out of validate_chart_dataset and up 
to the
   caller. Any MCP caller (the MCP tools that invoke this utility after chart 
creation or on
   preview/data requests, as introduced by the PR) will receive the uncaught 
exception
   instead of a safe DatasetValidationResult.
   
   4. Observed result: HTTP 500 or unhandled exception in MCP workflow instead 
of a
   controlled DatasetValidationResult with an error message. The auth helper 
used inside
   validate_chart_dataset is superset/mcp_service/auth.py:has_dataset_access 
(lines 97-127),
   which itself returns False on exception, but DAO/database errors at lookup 
time are not
   currently handled by the specific except clause.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/chart_utils.py
   **Line:** 137:138
   **Comment:**
        *Resource Leak: The exception handler at the end of 
`validate_chart_dataset` only catches (AttributeError, ValueError, 
RuntimeError), which will let many DAO/DB exceptions (e.g., SQLAlchemy errors, 
connection errors) propagate and crash callers. Broaden the catch to a generic 
Exception, log the full exception including traceback, and preserve the 
(possibly normalized) `datasource_id` in the returned `DatasetValidationResult` 
so callers get a safe, consistent failure result instead of an uncaught 
exception.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(mcp): add dataset validation for chart tools [superset]

Reply via email to