codeant-ai-for-open-source[bot] commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2935575527
##########
superset/sql/parse.py:
##########
@@ -135,6 +135,27 @@ class CTASMethod(enum.Enum):
VIEW = enum.auto()
+def strip_jinja(sql: str) -> str:
+ """
+ Replaces Jinja tags with parser-safe placeholders before SQL parsing.
+
+ Variable tags ({{ ... }}) are replaced with a dummy SQL identifier so the
+ surrounding SQL remains syntactically valid. Block tags ({% ... %}) and
+ comments ({# ... #}) are removed entirely.
+
+ Jinja content is Python, not SQL, so it is never inspected for SQL
keywords.
+ Subqueries embedded inside Jinja blocks are not valid Python expressions
and
+ would fail harmlessly at Jinja render time without leaking data.
+ """
+ # Remove comments
+ sql = re.sub(r"\{#.*?#\}", "", sql, flags=re.DOTALL)
+ # Replace variable tags with a safe SQL identifier
+ sql = re.sub(r"\{\{.*?\}\}", "__jinja__", sql, flags=re.DOTALL)
+ # Remove block tags ({% if %}, {% endif %}, {% set %}, etc.)
+ sql = re.sub(r"\{%.*?%\}", "", sql, flags=re.DOTALL)
Review Comment:
**Suggestion:** The Jinja stripping is fail-open: both `{{ ... }}` and `{%
... %}` are blindly replaced/removed, so SQL control keywords embedded inside
Jinja can bypass `has_subquery()` checks and therefore bypass the
`ALLOW_ADHOC_SUBQUERY` security gate. Replace tags conditionally so
keyword-bearing Jinja content is mapped to a synthetic subquery marker (for
fail-closed detection), while non-SQL Jinja remains a harmless placeholder.
[security]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ RLS rule creation misses hidden Jinja SQL keywords.
- ⚠️ Subquery policy enforcement becomes inconsistent across layers.
- ❌ Runtime queries can fail from previously accepted rules.
```
</details>
```suggestion
keyword_pattern = re.compile(
r"\b(select|union|insert|update|delete|create|drop|alter|truncate|except|intersect|merge)\b",
flags=re.IGNORECASE,
)
def _replace_var_tag(match):
content = match.group(1)
return "(SELECT 1)" if keyword_pattern.search(content) else
"__JINJA_VAR__"
def _replace_block_tag(match):
content = match.group(1)
return "(SELECT 1)" if keyword_pattern.search(content) else
"__JINJA_VAR__"
# Replace variable tags with a parser-safe marker
sql = re.sub(r"\{\{(.*?)\}\}", _replace_var_tag, sql, flags=re.DOTALL)
# Replace block tags with fail-closed markers when they contain SQL
control words
sql = re.sub(r"\{%(.*?)%\}", _replace_block_tag, sql, flags=re.DOTALL)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Set `ALLOW_ADHOC_SUBQUERY=False` (default at `superset/config.py:700`)
and create an
RLS rule through `POST` on the Row Level Security REST API
(`superset/row_level_security/api.py:64`, `:201` calls
`CreateRLSRuleCommand(item).run()`).
2. Use a Jinja-only clause that contains SQL control words inside the Jinja
payload (for
example a Jinja expression/string containing `SELECT`), so validation reaches
`validate_adhoc_subquery()` (`superset/models/helpers.py:193`, called by
`create.py:60` /
`update.py:75`).
3. In validation, `strip_jinja()` (`superset/sql/parse.py:138`) replaces
`{{...}}` with
`__jinja__` and removes `{%...%}` (`parse.py:153-155`), so
`parsed_statement.has_subquery()` at `helpers.py:247` does not see the
hidden SQL and the
rule is accepted.
4. Later, when queries apply RLS, `_process_rls_clause()`
(`superset/connectors/sqla/models.py:789`) runs template processing then
validation again;
this can now trip subquery checks and raise security validation errors at
runtime instead
of blocking at rule-creation time.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 152:155
**Comment:**
*Security: The Jinja stripping is fail-open: both `{{ ... }}` and `{%
... %}` are blindly replaced/removed, so SQL control keywords embedded inside
Jinja can bypass `has_subquery()` checks and therefore bypass the
`ALLOW_ADHOC_SUBQUERY` security gate. Replace tags conditionally so
keyword-bearing Jinja content is mapped to a synthetic subquery marker (for
fail-closed detection), while non-SQL Jinja remains a harmless placeholder.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=e90235187f79f918190cf91314ad3f0127f80d25d08d744c8dd50b39949a9df2&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=e90235187f79f918190cf91314ad3f0127f80d25d08d744c8dd50b39949a9df2&reaction=dislike'>👎</a>
##########
tests/integration_tests/security/row_level_security_tests.py:
##########
@@ -263,12 +263,22 @@ def test_rls_filter_alters_gamma_birth_names_query(self):
g.user = self.get_user(username="gamma")
tbl = self.get_table(name="birth_names")
sql = tbl.get_query_str(self.query_obj)
-
- # establish that the filters are grouped together correctly with
- # ANDs, ORs and parens in the correct place
- assert (
- "WHERE ((name like 'A%' or name like 'B%') OR (name like 'Q%'))
AND (gender = 'boy');" # noqa: E501
- in sql
+ sql_normalized = " ".join(sql.lower().split())
+
+ # Establish that all filters are present in the query.
+ # Use regex to be agnostic of parenthesis nesting and order.
+ assert re.search(r"name\s+like\s+'a%'", sql_normalized, re.IGNORECASE)
+ assert re.search(r"name\s+like\s+'b%'", sql_normalized, re.IGNORECASE)
+ assert re.search(r"name\s+like\s+'q%'", sql_normalized, re.IGNORECASE)
+ assert re.search(r"gender\s*=\s*'boy'", sql_normalized, re.IGNORECASE)
+
+ # Verify AND/OR grouping semantics: the name filters must be
OR-grouped,
+ # and the gender filter must be AND-ed with the name group (not OR-ed
in).
+ assert re.search(
+ r"name\s+like\s+'a%'\s+or\s+name\s+like\s+'b%'", sql_normalized
+ ), "A% and B% name filters must be OR-grouped"
+ assert re.search(r"\band\b.*gender\s*=\s*'boy'", sql_normalized), (
+ "gender filter must be AND-connected to the name filter group"
)
Review Comment:
**Suggestion:** The new grouping checks are order-dependent even though SQL
clause ordering is not guaranteed, so this test can fail intermittently when
equivalent predicates are emitted in a different order (for example `gender =
'boy'` appearing before the name group). Make the regex assertions
order-agnostic so they validate semantics instead of string order. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ RLS integration test fails on equivalent predicate ordering.
- ⚠️ CI can fail intermittently on harmless SQL differences.
```
</details>
```suggestion
assert re.search(
r"(name\s+like\s+'a%'\s+or\s+name\s+like\s+'b%')|(name\s+like\s+'b%'\s+or\s+name\s+like\s+'a%')",
sql_normalized,
), "A% and B% name filters must be OR-grouped"
assert re.search(
r"(gender\s*=\s*'boy'.*\band\b)|(\\band\\b.*gender\s*=\s*'boy')",
sql_normalized,
), "gender filter must be AND-connected to the name filter group"
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Run `pytest
tests/integration_tests/security/row_level_security_tests.py::TestRowLevelSecurity::test_rls_filter_alters_gamma_birth_names_query`
so execution enters `test_rls_filter_alters_gamma_birth_names_query` in
`tests/integration_tests/security/row_level_security_tests.py:262`.
2. The test calls `tbl.get_query_str(...)`, which flows through query
building in
`superset/models/helpers.py:2261` and applies RLS filters in
`superset/models/helpers.py:3290-3322`.
3. RLS filter order is sourced from `security_manager.get_rls_filters()` in
`superset/security/manager.py:2763-2794`, where `query.all()` has no
`order_by`;
`_get_processed_rls_filters()` in
`superset/connectors/sqla/models.py:805-837` preserves
whatever order is returned.
4. When SQL is rendered with semantically equivalent ordering (e.g.,
`gender='boy' AND
(...)` instead of `(...) AND gender='boy'`), the current regex at
`tests/integration_tests/security/row_level_security_tests.py:280-282` fails
despite
correct logic, creating flaky test failures.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 277:282
**Comment:**
*Logic Error: The new grouping checks are order-dependent even though
SQL clause ordering is not guaranteed, so this test can fail intermittently
when equivalent predicates are emitted in a different order (for example
`gender = 'boy'` appearing before the name group). Make the regex assertions
order-agnostic so they validate semantics instead of string order.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=dd32f9529c0292a431e2392b290caf9d85a947302dd7561d55ac735c95ef8a9d&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=dd32f9529c0292a431e2392b290caf9d85a947302dd7561d55ac735c95ef8a9d&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]