Re: [PR] fix(charts): add query_context validation to prevent incomplete metadata [superset]

via GitHub Tue, 23 Jun 2026 18:23:48 -0700


codeant-ai-for-open-source[bot] commented on code in PR #36076:
URL: https://github.com/apache/superset/pull/36076#discussion_r3463972585



##########
superset/utils/schema.py:
##########
@@ -83,3 +84,35 @@ def validate_external_url(value: Optional[str]) -> None:
         )
     if not parsed.netloc:
         raise ValidationError("URL must be absolute and include a host.")
+
+
+def validate_query_context_metadata(value: Union[bytes, bytearray, str, None]) 
-> None:
+    """
+    Validator for query_context field to ensure it contains required metadata.
+
+    Validates that the query_context JSON contains the required 'datasource' 
and
+    'queries' fields needed for chart data retrieval.
+
+    :raises ValidationError: if value is not valid JSON or missing required 
fields
+    :param value: a JSON string that should contain datasource and queries 
metadata
+    """
+    if value is None or value == "":
+        return  # Allow None values and empty strings
+
+    # Reuse existing JSON validation logic
+    validate_json(value)
+
+    # Parse and validate the structure
+    parsed_data = json.loads(value)
+
+    # Validate required fields exist in the query_context
+    if not isinstance(parsed_data, dict):
+        error_msg = "Query context must be a valid JSON object"
+        raise ValidationError(error_msg)
+
+    # When query_context is provided (not None), validate it has required 
fields
+    required_fields = {"datasource", "queries"}
+    missing_fields: set[str] = required_fields - parsed_data.keys()
+    if missing_fields:
+        fields_str = ", ".join(sorted(missing_fields))
+        raise ValidationError(f"Query context is missing required fields: 
{fields_str}")

Review Comment:
   **Suggestion:** The validation only checks that `datasource` and `queries` 
keys exist, but does not validate their value types/content, so payloads like 
`{"datasource": null, "queries": null}` pass and fail later when query context 
is built/executed. Add structural checks (e.g., `datasource` must be an object 
and `queries` must be a list, ideally non-empty) to enforce the contract at 
schema-validation time. [incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Saved charts can store unusable query_context payloads.
   - ❌ Chart data API `/api/v1/chart/<id>/data` returns 400 on use.
   - ⚠️ MCP `get_chart_data` tooling may silently fall back or fail.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. The `validate_query_context_metadata` function at 
`superset/utils/schema.py:89-118`
   only verifies that the parsed JSON object is a dict and that its key set 
includes
   `"datasource"` and `"queries"` via `required_fields = {"datasource", 
"queries"}` and
   `missing_fields = required_fields - parsed_data.keys()`, without validating 
value types or
   contents, as shown in lines 114-118.
   
   2. This validator is applied to `query_context` in both `ChartPostSchema` and
   `ChartPutSchema` at `superset/charts/schemas.py:32-36` and `95-99`, so a 
payload like
   `{"datasource": null, "queries": null}` serialized as a JSON string will 
pass schema
   validation and be persisted into `Slice.query_context` (see
   `superset/models/slice.py:69-84` where `query_context = 
Column(utils.MediumText())`).
   
   3. When chart data is later requested via `GET /api/v1/chart/<id>/data` in
   `superset/charts/data/api.py:150-31`, the code does `json_body =
   json.loads(chart.query_context)` and then calls `query_context =
   self._create_query_context_from_form(json_body)` at line 217, where
   `_create_query_context_from_form` delegates to
   `ChartDataQueryContextSchema().load(form_data)` (see the analogous 
implementation in
   `superset/tasks/async_queries.py:1-12`). That schema expects a structured 
`datasource`
   object and a list of query dicts; with `datasource=None` and `queries=None`, 
it will raise
   a `ValidationError`.
   
   4. The `ValidationError` from `ChartDataQueryContextSchema` is caught in
   `superset/charts/data/api.py:224-41` and converted into a 400 response, and 
similar
   parsing/validation occurs in MCP chart tooling
   (`superset/mcp_service/chart/tool/get_chart_data.py:450-485` and
   `superset/mcp_service/chart/tool/get_chart_sql.py:170-209`), meaning charts 
saved with
   such minimally valid-but-structurally-wrong `query_context` pass the current 
metadata
   validator but subsequently fail whenever their data or SQL is requested, 
leaving persisted
   but unusable chart configurations.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=602beb9408da4ad6ada758a00e183a2d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=602beb9408da4ad6ada758a00e183a2d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/utils/schema.py
   **Line:** 114:118
   **Comment:**
        *Incomplete Implementation: The validation only checks that 
`datasource` and `queries` keys exist, but does not validate their value 
types/content, so payloads like `{"datasource": null, "queries": null}` pass 
and fail later when query context is built/executed. Add structural checks 
(e.g., `datasource` must be an object and `queries` must be a list, ideally 
non-empty) to enforce the contract at schema-validation time.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F36076&comment_hash=c3a3ee4a4553d795227be629c9302d4ea0c55f4f1b29e70758db24c6ec27d7b9&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F36076&comment_hash=c3a3ee4a4553d795227be629c9302d4ea0c55f4f1b29e70758db24c6ec27d7b9&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(charts): add query_context validation to prevent incomplete metadata [superset]

Reply via email to