codeant-ai-for-open-source[bot] commented on code in PR #38490:
URL: https://github.com/apache/superset/pull/38490#discussion_r2900103359
##########
superset/databases/schemas.py:
##########
@@ -277,6 +277,20 @@ def extra_validator(value: str) -> str:
)
]
)
+
+ metadata_cache_timeout = extra_.get("metadata_cache_timeout", {})
+ for key in ("schema_cache_timeout", "table_cache_timeout"):
+ timeout = metadata_cache_timeout.get(key)
+ if timeout is not None and (not isinstance(timeout, int) or
timeout < 0):
+ raise ValidationError(
+ [
+ _(
+ "The %(key)s in metadata_cache_timeout must be a "
+ "non-negative integer.",
+ key=key,
+ )
+ ]
+ )
Review Comment:
**Suggestion:** The new validation assumes that `metadata_cache_timeout` is
always a dict and calls `.get()` on it; if existing configs have
`metadata_cache_timeout` set to a non-dict (e.g. an integer, string, list, or
null), this will raise an AttributeError instead of a clean ValidationError,
breaking schema loading for such payloads. Add a type check (and gracefully
handle non-dict/None cases) before iterating over keys so that invalid
structures are rejected with a proper validation error rather than causing an
unexpected exception. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Database creation with malformed metadata_cache_timeout crashes backend.
- ❌ Database update using DatabasePutSchema fails with AttributeError.
- ⚠️ Test-connection flows using DatabaseTestConnectionSchema can error.
- ⚠️ Admin database connection UI may show generic server error.
- ⚠️ Invalid extra structure not reported as clear validation error.
```
</details>
```suggestion
metadata_cache_timeout = extra_.get("metadata_cache_timeout") or {}
if not isinstance(metadata_cache_timeout, dict):
raise ValidationError(
[
_(
"The metadata_cache_timeout must be a mapping from
string keys "
"to non-negative integer values."
)
]
)
for key in ("schema_cache_timeout", "table_cache_timeout"):
timeout = metadata_cache_timeout.get(key)
if timeout is not None and (not isinstance(timeout, int) or
timeout < 0):
raise ValidationError(
[
_(
"The %(key)s in metadata_cache_timeout must be a
"
"non-negative integer.",
key=key,
)
]
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/databases/schemas.py`, note that `extra_validator` (around
lines 246–294
in the final file) deserializes the JSON string `extra` and then executes:
`metadata_cache_timeout = extra_.get("metadata_cache_timeout", {})`
followed by `metadata_cache_timeout.get(...)` on lines 281–283.
2. Also in the same file, observe that multiple Marshmallow schemas attach
`extra_validator` to their `extra` field, for example:
- `DatabasePostSchema.extra` (defined later in
`superset/databases/schemas.py`) with
`validate=extra_validator`
- `DatabasePutSchema.extra`, `DatabaseValidateParametersSchema.extra`,
`DatabaseTestConnectionSchema.extra`, and
`DatabaseConnectionSchema.extra` all
similarly use `validate=extra_validator`.
3. In a real execution path, any of these schemas will be used to validate
payloads
containing a string `extra` field. To reproduce via code, run the Superset
backend (or
open a Python shell in the project) and execute:
```python
from superset.databases.schemas import DatabasePostSchema
# `metadata_cache_timeout` is incorrectly set to an integer, not a dict
data = {
"database_name": "test_db",
"sqlalchemy_uri": "sqlite:///test.db",
"extra": '{"metadata_cache_timeout": 600}'
}
DatabasePostSchema().load(data)
```
This triggers `extra_validator` via the `extra` field validation.
4. During `extra_validator` execution, `extra_["metadata_cache_timeout"]`
will be the
integer `600`, so `metadata_cache_timeout = 600`. On line 283, `timeout =
metadata_cache_timeout.get(key)` is executed on an `int`, raising
`AttributeError: 'int'
object has no attribute 'get'`. This bubbles up as an unexpected exception
instead of a
controlled `ValidationError`, causing the database create/update/test flow
that uses this
schema to fail with a 500-style error rather than a user-facing validation
message.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/databases/schemas.py
**Line:** 281:293
**Comment:**
*Type Error: The new validation assumes that `metadata_cache_timeout`
is always a dict and calls `.get()` on it; if existing configs have
`metadata_cache_timeout` set to a non-dict (e.g. an integer, string, list, or
null), this will raise an AttributeError instead of a clean ValidationError,
breaking schema loading for such payloads. Add a type check (and gracefully
handle non-dict/None cases) before iterating over keys so that invalid
structures are rejected with a proper validation error rather than causing an
unexpected exception.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38490&comment_hash=ccbe8c8a905309512e50c4d029311390f181cf42d9471a51bb4022f24f0fcc55&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38490&comment_hash=ccbe8c8a905309512e50c4d029311390f181cf42d9471a51bb4022f24f0fcc55&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]