codeant-ai-for-open-source[bot] commented on code in PR #36076:
URL: https://github.com/apache/superset/pull/36076#discussion_r3463972585
##########
superset/utils/schema.py:
##########
@@ -83,3 +84,35 @@ def validate_external_url(value: Optional[str]) -> None:
)
if not parsed.netloc:
raise ValidationError("URL must be absolute and include a host.")
+
+
+def validate_query_context_metadata(value: Union[bytes, bytearray, str, None])
-> None:
+ """
+ Validator for query_context field to ensure it contains required metadata.
+
+ Validates that the query_context JSON contains the required 'datasource'
and
+ 'queries' fields needed for chart data retrieval.
+
+ :raises ValidationError: if value is not valid JSON or missing required
fields
+ :param value: a JSON string that should contain datasource and queries
metadata
+ """
+ if value is None or value == "":
+ return # Allow None values and empty strings
+
+ # Reuse existing JSON validation logic
+ validate_json(value)
+
+ # Parse and validate the structure
+ parsed_data = json.loads(value)
+
+ # Validate required fields exist in the query_context
+ if not isinstance(parsed_data, dict):
+ error_msg = "Query context must be a valid JSON object"
+ raise ValidationError(error_msg)
+
+ # When query_context is provided (not None), validate it has required
fields
+ required_fields = {"datasource", "queries"}
+ missing_fields: set[str] = required_fields - parsed_data.keys()
+ if missing_fields:
+ fields_str = ", ".join(sorted(missing_fields))
+ raise ValidationError(f"Query context is missing required fields:
{fields_str}")
Review Comment:
**Suggestion:** The validation only checks that `datasource` and `queries`
keys exist, but does not validate their value types/content, so payloads like
`{"datasource": null, "queries": null}` pass and fail later when query context
is built/executed. Add structural checks (e.g., `datasource` must be an object
and `queries` must be a list, ideally non-empty) to enforce the contract at
schema-validation time. [incomplete implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Saved charts can store unusable query_context payloads.
- ❌ Chart data API `/api/v1/chart/<id>/data` returns 400 on use.
- ⚠️ MCP `get_chart_data` tooling may silently fall back or fail.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. The `validate_query_context_metadata` function at
`superset/utils/schema.py:89-118`
only verifies that the parsed JSON object is a dict and that its key set
includes
`"datasource"` and `"queries"` via `required_fields = {"datasource",
"queries"}` and
`missing_fields = required_fields - parsed_data.keys()`, without validating
value types or
contents, as shown in lines 114-118.
2. This validator is applied to `query_context` in both `ChartPostSchema` and
`ChartPutSchema` at `superset/charts/schemas.py:32-36` and `95-99`, so a
payload like
`{"datasource": null, "queries": null}` serialized as a JSON string will
pass schema
validation and be persisted into `Slice.query_context` (see
`superset/models/slice.py:69-84` where `query_context =
Column(utils.MediumText())`).
3. When chart data is later requested via `GET /api/v1/chart/<id>/data` in
`superset/charts/data/api.py:150-31`, the code does `json_body =
json.loads(chart.query_context)` and then calls `query_context =
self._create_query_context_from_form(json_body)` at line 217, where
`_create_query_context_from_form` delegates to
`ChartDataQueryContextSchema().load(form_data)` (see the analogous
implementation in
`superset/tasks/async_queries.py:1-12`). That schema expects a structured
`datasource`
object and a list of query dicts; with `datasource=None` and `queries=None`,
it will raise
a `ValidationError`.
4. The `ValidationError` from `ChartDataQueryContextSchema` is caught in
`superset/charts/data/api.py:224-41` and converted into a 400 response, and
similar
parsing/validation occurs in MCP chart tooling
(`superset/mcp_service/chart/tool/get_chart_data.py:450-485` and
`superset/mcp_service/chart/tool/get_chart_sql.py:170-209`), meaning charts
saved with
such minimally valid-but-structurally-wrong `query_context` pass the current
metadata
validator but subsequently fail whenever their data or SQL is requested,
leaving persisted
but unusable chart configurations.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=602beb9408da4ad6ada758a00e183a2d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=602beb9408da4ad6ada758a00e183a2d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/utils/schema.py
**Line:** 114:118
**Comment:**
*Incomplete Implementation: The validation only checks that
`datasource` and `queries` keys exist, but does not validate their value
types/content, so payloads like `{"datasource": null, "queries": null}` pass
and fail later when query context is built/executed. Add structural checks
(e.g., `datasource` must be an object and `queries` must be a list, ideally
non-empty) to enforce the contract at schema-validation time.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F36076&comment_hash=c3a3ee4a4553d795227be629c9302d4ea0c55f4f1b29e70758db24c6ec27d7b9&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F36076&comment_hash=c3a3ee4a4553d795227be629c9302d4ea0c55f4f1b29e70758db24c6ec27d7b9&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]