codeant-ai-for-open-source[bot] commented on code in PR #40499:
URL: https://github.com/apache/superset/pull/40499#discussion_r3405717589


##########
superset/sql/parse.py:
##########
@@ -54,6 +55,43 @@
 logger = logging.getLogger(__name__)
 
 
+# Fallback parse-length bound applied when no Flask app context is active
+# (Alembic migrations, scripts, isolated unit tests). The runtime value is
+# read from `SQL_MAX_PARSE_LENGTH` in app config; keep these two in sync.
+_DEFAULT_MAX_PARSE_LENGTH: int = 1_000_000
+
+
+def _check_script_length(script: str, engine: str | None) -> None:
+    """
+    Reject scripts whose UTF-8 byte length exceeds the configured maximum
+    before they reach sqlglot. Sits at every code path in this module that
+    hands a string to ``sqlglot.parse`` or ``sqlglot.parse_one`` so the
+    bound cannot be bypassed by a direct caller.
+
+    The check is in bytes, not Unicode code points, because the
+    threat model is parser memory and CPU on the encoded payload that
+    sqlglot ingests.
+    """
+    if has_app_context():
+        max_length = current_app.config.get(
+            "SQL_MAX_PARSE_LENGTH", _DEFAULT_MAX_PARSE_LENGTH
+        )
+    else:
+        max_length = _DEFAULT_MAX_PARSE_LENGTH

Review Comment:
   **Suggestion:** The no-app-context branch ignores the configured 
`SQL_MAX_PARSE_LENGTH` and always uses the hardcoded default, so setting 
`SQL_MAX_PARSE_LENGTH = None` (or any custom value) does not take effect in 
scripts/migrations/tests running without Flask app context. This breaks the 
documented config contract and can unexpectedly reject valid SQL outside 
request contexts; resolve by sourcing the configured fallback value instead of 
hardcoding `_DEFAULT_MAX_PARSE_LENGTH`. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Custom SQL_MAX_PARSE_LENGTH ignored in non-Flask parser usage.
   - ⚠️ Standalone scripts using SQLScript mis-handle large SQL queries.
   - ⚠️ Migration or tooling code may unexpectedly reject valid statements.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Note configuration: `SQL_MAX_PARSE_LENGTH: int | None = 1_000_000` is 
defined in
   `superset/config.py:1388-1411` with comment "Set to None to disable the 
check." Users can
   override this (e.g., `SQL_MAX_PARSE_LENGTH = None`) in `superset_config.py` 
expecting it
   to globally disable parse-length enforcement.
   
   2. Inspect `_check_script_length` in `superset/sql/parse.py:25-45`. When
   `has_app_context()` is True, it reads `max_length =
   current_app.config.get("SQL_MAX_PARSE_LENGTH", _DEFAULT_MAX_PARSE_LENGTH)` 
at lines 36-39,
   honoring runtime config. When `has_app_context()` is False, the `else` 
branch at lines
   40-41 unconditionally sets `max_length = _DEFAULT_MAX_PARSE_LENGTH`, ignoring
   `SQL_MAX_PARSE_LENGTH` from app configuration.
   
   3. Observe a real non-app-context usage pattern in tests: `_small_parse_cap` 
in
   `tests/unit_tests/sql/parse_tests.py:9-17` explicitly forces 
`has_app_context` to return
   False and uses `_DEFAULT_MAX_PARSE_LENGTH` to control the cap, and
   `test_sqlscript_gate_short_circuits_before_sqlglot` at lines 70-84 constructs
   `SQLScript(over_cap_with_backtick, "base")`, which calls 
`_check_script_length` via
   `SQLScript` → `SQLStatement._parse` (`superset/sql/parse.py:24-37`).
   
   4. In any similar real-world context without a Flask app context (for 
example, a
   standalone script or a migration helper that imports `SQLScript` and calls 
`SQLScript("x"
   * 2_000_000, "base")` without pushing an app context, following the pattern 
in the
   migration
   
`superset/superset/migrations/versions/2022-04-01_14-38_a9422eeaae74_new_dataset_models_take_2.py:608-14`),
   `_check_script_length` will execute the no-app-context branch and enforce
   `_DEFAULT_MAX_PARSE_LENGTH` regardless of the configured 
`SQL_MAX_PARSE_LENGTH`. Thus,
   setting `SQL_MAX_PARSE_LENGTH = None` (or any other value) in configuration 
has no effect
   in these non-app-context executions, leading to inconsistent and surprising 
behavior
   compared to in-app parsing.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=3d6e21db81af49878369ea82b5b4afd5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=3d6e21db81af49878369ea82b5b4afd5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 75:80
   **Comment:**
        *Api Mismatch: The no-app-context branch ignores the configured 
`SQL_MAX_PARSE_LENGTH` and always uses the hardcoded default, so setting 
`SQL_MAX_PARSE_LENGTH = None` (or any custom value) does not take effect in 
scripts/migrations/tests running without Flask app context. This breaks the 
documented config contract and can unexpectedly reject valid SQL outside 
request contexts; resolve by sourcing the configured fallback value instead of 
hardcoding `_DEFAULT_MAX_PARSE_LENGTH`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40499&comment_hash=f2d0b97a798b2a8c74e48049f8eed4a704b37c0d9f3c41eb7a37bd7d6a1d5291&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40499&comment_hash=f2d0b97a798b2a8c74e48049f8eed4a704b37c0d9f3c41eb7a37bd7d6a1d5291&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to