codeant-ai-for-open-source[bot] commented on code in PR #38488:
URL: https://github.com/apache/superset/pull/38488#discussion_r2899453713
##########
superset/sql/parse.py:
##########
@@ -553,13 +553,18 @@ class SQLStatement(BaseSQLStatement[exp.Expression]):
This class is used for all engines with dialects that can be parsed using
sqlglot.
"""
+ # Optimizer hints have the form /*+ ... */
+ _OPTIMIZER_HINT_PATTERN = re.compile(r"/\*\+")
+
def __init__(
self,
statement: str | None = None,
engine: str = "base",
ast: exp.Expression | None = None,
):
self._dialect = SQLGLOT_DIALECTS.get(engine)
+ # Store original statement to detect optimizer hints
+ self._original_statement: str | None = statement
Review Comment:
**Suggestion:** Optimizer hint detection in `format()` relies on
`_original_statement`, but when `SQLStatement` is instantiated from an AST
(e.g., via `split_script` or `optimize`), `_original_statement` is left as
`None`, so queries that were parsed into ASTs lose their original text and any
optimizer hints they contained, causing comment preservation to remain enabled
and the original hint/comment corruption bug to persist for those code paths.
Initializing `_original_statement` from the AST when no raw statement is
provided ensures consistent hint detection. [logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ SQLScript.format outputs invalid SQL with optimizer hints.
- ⚠️ Optimized statements lose hint-aware comment handling entirely.
```
</details>
```suggestion
self._original_statement: str | None = (
statement or (ast.sql() if ast is not None else None)
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In `superset/sql/parse.py:~1190`, construct a `SQLScript` with a
StarRocks/MySQL query
containing an optimizer hint, e.g. `SQLScript("SELECT /*+
SET_VAR(query_timeout=5) */ col
FROM t -- inline", engine="starrocks")`.
2. Inside `SQLScript.__init__` (`superset/sql/parse.py:~1180`),
`statement_class` is
`SQLStatement`, and `statement_class.split_script(script, engine)` is called.
3. In `SQLStatement.split_script` (`superset/sql/parse.py:~600`),
`_parse(script, engine)`
returns `exp.Expression` ASTs; `SQLStatement(ast=ast, engine=engine)` is
then invoked for
each AST with `statement=None`, so in `SQLStatement.__init__`
(`superset/sql/parse.py:567`) `self._original_statement` is set to `None`.
4. When the script is pretty-printed (e.g. `SQLScript.format(comments=True)`
in
`superset/sql/parse.py:~1200`), it calls `SQLStatement.format(comments)` for
each
statement; in `SQLStatement.format` (`superset/sql/parse.py:726-744`)
`has_optimizer_hints` is computed from `self._original_statement` which is
`None`, so
`effective_comments` stays `True` and `Dialect.generate(..., comments=True,
pretty=True)`
lets sqlglot reposition comments inside the `/*+ ... */` hint block,
reproducing the
original syntax-corruption bug for this code path.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 567:567
**Comment:**
*Logic Error: Optimizer hint detection in `format()` relies on
`_original_statement`, but when `SQLStatement` is instantiated from an AST
(e.g., via `split_script` or `optimize`), `_original_statement` is left as
`None`, so queries that were parsed into ASTs lose their original text and any
optimizer hints they contained, causing comment preservation to remain enabled
and the original hint/comment corruption bug to persist for those code paths.
Initializing `_original_statement` from the AST when no raw statement is
provided ensures consistent hint detection.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38488&comment_hash=e864f9fd3d85ef2b8dd58162b9daf1e0fe3d9732d6ed071a0804dd9778a40de0&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38488&comment_hash=e864f9fd3d85ef2b8dd58162b9daf1e0fe3d9732d6ed071a0804dd9778a40de0&reaction=dislike'>👎</a>
##########
tests/unit_tests/sql/parse_tests.py:
##########
@@ -3134,3 +3134,66 @@ def test_backtick_invalid_sql_still_fails() -> None:
sql = "SELECT * FROM `table` WHERE"
with pytest.raises(SupersetParseError):
SQLScript(sql, "base")
+
+
+def test_optimizer_hint_with_comment() -> None:
+ """
+ Test that optimizer hints are not corrupted by inline comments.
+
+ This is a regression test for issue #38189 where sqlglot incorrectly
+ injects converted inline comments (--) inside optimizer hint blocks
+ (/*+ ... */), breaking StarRocks/MySQL syntax.
+
+ The fix detects optimizer hints and disables comment preservation
+ to prevent sqlglot from corrupting the hint syntax.
+ """
+ # SQL with optimizer hint and inline comment
+ sql = """SELECT /*+ SET_VAR(query_timeout = 3000) */ col1, col2
+FROM my_table
+LIMIT 100
+
+-- increase timeout for large scans"""
+
+ script = SQLScript(sql, "mysql")
+ formatted = script.format(comments=True)
+
+ # The optimizer hint should remain intact, not have comments injected
inside
+ assert "/*+ SET_VAR(query_timeout = 3000) */" in formatted
+ # The inline comment should be stripped (not converted to /* */ inside
hint)
+ assert "query_timeout /* increase timeout" not in formatted
+ assert "3000) */" in formatted # Hint closing should be intact
+
+
+def test_optimizer_hint_disabled_comments() -> None:
+ """
+ Test that comments are disabled when optimizer hints are present.
+ """
+ sql = "SELECT /*+ SET_VAR(query_timeout = 3000) */ col FROM t"
+
+ script = SQLScript(sql, "mysql")
+ statement = script.statements[0]
+
+ # Check that the optimizer hint pattern is detected
+ assert statement._original_statement is not None
+ assert "/*+ SET_VAR(query_timeout = 3000) */" in
statement._original_statement
+
+ # When formatting with comments=True, it should still disable comments
+ # due to optimizer hint detection
+ formatted = statement.format(comments=True)
+ assert "/*+ SET_VAR(query_timeout = 3000) */" in formatted
+
Review Comment:
**Suggestion:** The test that claims to verify that comments are disabled
when optimizer hints are present never includes any inline comments in the SQL,
so it doesn't actually exercise or assert the behavior it describes; as a
result, regressions in comment handling around optimizer hints could pass this
test undetected. You should include an inline comment in the SQL and assert
that it is not preserved in the formatted output to truly validate the intended
behavior. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Optimizer hint comment regression may pass unit tests.
- ⚠️ StarRocks/MySQL hint syntax corruption risk resurfaces silently.
- ⚠️ CI coverage around sqlglot hint handling weakened.
```
</details>
```suggestion
def test_optimizer_hint_disabled_comments() -> None:
"""
Test that comments are disabled when optimizer hints are present.
"""
sql = """SELECT /*+ SET_VAR(query_timeout = 3000) */ col FROM t
-- some inline comment that should not be preserved"""
script = SQLScript(sql, "mysql")
statement = script.statements[0]
# Check that the optimizer hint pattern is detected
assert statement._original_statement is not None
assert "/*+ SET_VAR(query_timeout = 3000) */" in
statement._original_statement
# When formatting with comments=True, it should still disable
comments
# due to optimizer hint detection
formatted = statement.format(comments=True)
assert "/*+ SET_VAR(query_timeout = 3000) */" in formatted
# The inline comment should not be preserved when optimizer hints
are present
assert "some inline comment" not in formatted
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `tests/unit_tests/sql/parse_tests.py` at lines 3168–3172 and read
`test_optimizer_hint_disabled_comments`, whose docstring says "Test that
comments are
disabled when optimizer hints are present."
2. Inspect the `sql` definition at line 3172: `sql = "SELECT /*+
SET_VAR(query_timeout =
3000) */ col FROM t"`, which contains an optimizer hint but no `--` inline
comment or
other non-hint comments.
3. Follow the test flow at lines 3174–3184: `SQLScript(sql, "mysql")` is
created,
`statement.format(comments=True)` is called, and the only behavior asserted
is that the
optimizer hint substring `"/*+ SET_VAR(query_timeout = 3000) */"` remains
present in
`formatted`.
4. Note that nowhere in this test is any comment text asserted to be removed
or not
preserved, so the "comments are disabled" behavior is never exercised; a
future regression
in comment handling around optimizer hints in `SQLScript`/`SQLStatement`
(implemented in
`superset/sql/parse.py`, imported at the top of this file) could occur while
this specific
test still passes.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/sql/parse_tests.py
**Line:** 3168:3184
**Comment:**
*Logic Error: The test that claims to verify that comments are disabled
when optimizer hints are present never includes any inline comments in the SQL,
so it doesn't actually exercise or assert the behavior it describes; as a
result, regressions in comment handling around optimizer hints could pass this
test undetected. You should include an inline comment in the SQL and assert
that it is not preserved in the formatted output to truly validate the intended
behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38488&comment_hash=e895aa207be9f82bf70bdecbf7635fc75b8be93bfa8230454b1d9176b66d0626&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38488&comment_hash=e895aa207be9f82bf70bdecbf7635fc75b8be93bfa8230454b1d9176b66d0626&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]