codeant-ai-for-open-source[bot] commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2929450581
##########
superset/sql/parse.py:
##########
@@ -135,6 +135,33 @@ class CTASMethod(enum.Enum):
VIEW = enum.auto()
+def strip_jinja(sql: str) -> str:
+ """
+ Replaces Jinja tags with parser-safe placeholders.
+
+ If a tag contains SQL keywords that could indicate a subquery or mutation,
+ it is replaced with a synthetic subquery to ensure it is flagged by
+ security checks. Otherwise, it is replaced with a dummy identifier.
+ """
+ sql_keyword_pattern = re.compile(
+
r"\b(select|union|except|intersect|insert|update|delete|merge|create|drop|alter|truncate)\b",
+ flags=re.IGNORECASE,
+ )
+
+ def _replace(match: re.Match[str]) -> str:
+ body = match.group(1)
+ if sql_keyword_pattern.search(body):
+ return "(SELECT 1)"
+ return "__JINJA_VAR__"
+
+ # Replace comments {# ... #} with empty string
+ sql = re.sub(r"\{#.*?#\}", "", sql, flags=re.DOTALL)
+ # Replace {{ ... }} and {% ... %} using the keyword-aware replacer
+ sql = re.sub(r"\{\{(.*?)\}\}", _replace, sql, flags=re.DOTALL)
+ sql = re.sub(r"\{%(.*?)%\}", _replace, sql, flags=re.DOTALL)
Review Comment:
**Suggestion:** Replacing non-keyword `{% ... %}` control tags with
`__JINJA_VAR__` injects invalid tokens into SQL (for example around `{% if %}`
/ `{% endif %}`), which can make valid templated predicates fail parsing and
break RLS validation. For statement tags, replace non-keyword blocks with an
empty string instead of an identifier. [possible bug]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ RLS create endpoint rejects valid Jinja control clauses.
- ❌ RLS update endpoint fails via same validator path.
- ⚠️ Admin policy authoring workflow breaks with server errors.
```
</details>
```suggestion
sql = re.sub(
r"\{%(.*?)%\}",
lambda match: "(SELECT 1)"
if sql_keyword_pattern.search(match.group(1))
else "",
sql,
flags=re.DOTALL,
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Submit RLS create request to `POST /api/v1/rowlevelsecurity/`
(`superset/row_level_security/api.py:151-202`) with a clause using control
tags, e.g. `{%
if current_user_id() %} owner_id = {{ current_user_id() }} {% endif %}`.
2. Request reaches `CreateRLSRuleCommand.validate()`
(`superset/commands/security/create.py:45-67`), which calls
`validate_adhoc_subquery(...,
is_predicate=True)` at line 60.
3. `validate_adhoc_subquery()` calls `strip_jinja()`
(`superset/models/helpers.py:41-50`);
`strip_jinja` currently replaces `{% ... %}` with `__JINJA_VAR__`
(`superset/sql/parse.py:161`), creating non-SQL tokens around control-flow
blocks.
4. Predicate parsing then builds `SQLStatement(f"SELECT * WHERE
{clean_sql}", engine)`
(`superset/models/helpers.py:50`), and parser fails on injected tokens;
exception
propagates because `RLSRestApi.post()` does not catch parse/security
exceptions (it only
handles role/datasource/SQLAlchemy errors at
`superset/row_level_security/api.py:203-226`), so rule save fails.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 161:161
**Comment:**
*Possible Bug: Replacing non-keyword `{% ... %}` control tags with
`__JINJA_VAR__` injects invalid tokens into SQL (for example around `{% if %}`
/ `{% endif %}`), which can make valid templated predicates fail parsing and
break RLS validation. For statement tags, replace non-keyword blocks with an
empty string instead of an identifier.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=2615523eb6e70313d627e8c53c1c0fdba6ce563dae665a211e879811d3f3c385&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=2615523eb6e70313d627e8c53c1c0fdba6ce563dae665a211e879811d3f3c385&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]