codeant-ai-for-open-source[bot] commented on code in PR #40499:
URL: https://github.com/apache/superset/pull/40499#discussion_r3405717589
##########
superset/sql/parse.py:
##########
@@ -54,6 +55,43 @@
logger = logging.getLogger(__name__)
+# Fallback parse-length bound applied when no Flask app context is active
+# (Alembic migrations, scripts, isolated unit tests). The runtime value is
+# read from `SQL_MAX_PARSE_LENGTH` in app config; keep these two in sync.
+_DEFAULT_MAX_PARSE_LENGTH: int = 1_000_000
+
+
+def _check_script_length(script: str, engine: str | None) -> None:
+ """
+ Reject scripts whose UTF-8 byte length exceeds the configured maximum
+ before they reach sqlglot. Sits at every code path in this module that
+ hands a string to ``sqlglot.parse`` or ``sqlglot.parse_one`` so the
+ bound cannot be bypassed by a direct caller.
+
+ The check is in bytes, not Unicode code points, because the
+ threat model is parser memory and CPU on the encoded payload that
+ sqlglot ingests.
+ """
+ if has_app_context():
+ max_length = current_app.config.get(
+ "SQL_MAX_PARSE_LENGTH", _DEFAULT_MAX_PARSE_LENGTH
+ )
+ else:
+ max_length = _DEFAULT_MAX_PARSE_LENGTH
Review Comment:
**Suggestion:** The no-app-context branch ignores the configured
`SQL_MAX_PARSE_LENGTH` and always uses the hardcoded default, so setting
`SQL_MAX_PARSE_LENGTH = None` (or any custom value) does not take effect in
scripts/migrations/tests running without Flask app context. This breaks the
documented config contract and can unexpectedly reject valid SQL outside
request contexts; resolve by sourcing the configured fallback value instead of
hardcoding `_DEFAULT_MAX_PARSE_LENGTH`. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Custom SQL_MAX_PARSE_LENGTH ignored in non-Flask parser usage.
- ⚠️ Standalone scripts using SQLScript mis-handle large SQL queries.
- ⚠️ Migration or tooling code may unexpectedly reject valid statements.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Note configuration: `SQL_MAX_PARSE_LENGTH: int | None = 1_000_000` is
defined in
`superset/config.py:1388-1411` with comment "Set to None to disable the
check." Users can
override this (e.g., `SQL_MAX_PARSE_LENGTH = None`) in `superset_config.py`
expecting it
to globally disable parse-length enforcement.
2. Inspect `_check_script_length` in `superset/sql/parse.py:25-45`. When
`has_app_context()` is True, it reads `max_length =
current_app.config.get("SQL_MAX_PARSE_LENGTH", _DEFAULT_MAX_PARSE_LENGTH)`
at lines 36-39,
honoring runtime config. When `has_app_context()` is False, the `else`
branch at lines
40-41 unconditionally sets `max_length = _DEFAULT_MAX_PARSE_LENGTH`, ignoring
`SQL_MAX_PARSE_LENGTH` from app configuration.
3. Observe a real non-app-context usage pattern in tests: `_small_parse_cap`
in
`tests/unit_tests/sql/parse_tests.py:9-17` explicitly forces
`has_app_context` to return
False and uses `_DEFAULT_MAX_PARSE_LENGTH` to control the cap, and
`test_sqlscript_gate_short_circuits_before_sqlglot` at lines 70-84 constructs
`SQLScript(over_cap_with_backtick, "base")`, which calls
`_check_script_length` via
`SQLScript` → `SQLStatement._parse` (`superset/sql/parse.py:24-37`).
4. In any similar real-world context without a Flask app context (for
example, a
standalone script or a migration helper that imports `SQLScript` and calls
`SQLScript("x"
* 2_000_000, "base")` without pushing an app context, following the pattern
in the
migration
`superset/superset/migrations/versions/2022-04-01_14-38_a9422eeaae74_new_dataset_models_take_2.py:608-14`),
`_check_script_length` will execute the no-app-context branch and enforce
`_DEFAULT_MAX_PARSE_LENGTH` regardless of the configured
`SQL_MAX_PARSE_LENGTH`. Thus,
setting `SQL_MAX_PARSE_LENGTH = None` (or any other value) in configuration
has no effect
in these non-app-context executions, leading to inconsistent and surprising
behavior
compared to in-app parsing.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3d6e21db81af49878369ea82b5b4afd5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3d6e21db81af49878369ea82b5b4afd5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 75:80
**Comment:**
*Api Mismatch: The no-app-context branch ignores the configured
`SQL_MAX_PARSE_LENGTH` and always uses the hardcoded default, so setting
`SQL_MAX_PARSE_LENGTH = None` (or any custom value) does not take effect in
scripts/migrations/tests running without Flask app context. This breaks the
documented config contract and can unexpectedly reject valid SQL outside
request contexts; resolve by sourcing the configured fallback value instead of
hardcoding `_DEFAULT_MAX_PARSE_LENGTH`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40499&comment_hash=f2d0b97a798b2a8c74e48049f8eed4a704b37c0d9f3c41eb7a37bd7d6a1d5291&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40499&comment_hash=f2d0b97a798b2a8c74e48049f8eed4a704b37c0d9f3c41eb7a37bd7d6a1d5291&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]