codeant-ai-for-open-source[bot] commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2935575527


##########
superset/sql/parse.py:
##########
@@ -135,6 +135,27 @@ class CTASMethod(enum.Enum):
     VIEW = enum.auto()
 
 
+def strip_jinja(sql: str) -> str:
+    """
+    Replaces Jinja tags with parser-safe placeholders before SQL parsing.
+
+    Variable tags ({{ ... }}) are replaced with a dummy SQL identifier so the
+    surrounding SQL remains syntactically valid. Block tags ({% ... %}) and
+    comments ({# ... #}) are removed entirely.
+
+    Jinja content is Python, not SQL, so it is never inspected for SQL 
keywords.
+    Subqueries embedded inside Jinja blocks are not valid Python expressions 
and
+    would fail harmlessly at Jinja render time without leaking data.
+    """
+    # Remove comments
+    sql = re.sub(r"\{#.*?#\}", "", sql, flags=re.DOTALL)
+    # Replace variable tags with a safe SQL identifier
+    sql = re.sub(r"\{\{.*?\}\}", "__jinja__", sql, flags=re.DOTALL)
+    # Remove block tags ({% if %}, {% endif %}, {% set %}, etc.)
+    sql = re.sub(r"\{%.*?%\}", "", sql, flags=re.DOTALL)

Review Comment:
   **Suggestion:** The Jinja stripping is fail-open: both `{{ ... }}` and `{% 
... %}` are blindly replaced/removed, so SQL control keywords embedded inside 
Jinja can bypass `has_subquery()` checks and therefore bypass the 
`ALLOW_ADHOC_SUBQUERY` security gate. Replace tags conditionally so 
keyword-bearing Jinja content is mapped to a synthetic subquery marker (for 
fail-closed detection), while non-SQL Jinja remains a harmless placeholder. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ RLS rule creation misses hidden Jinja SQL keywords.
   - ⚠️ Subquery policy enforcement becomes inconsistent across layers.
   - ❌ Runtime queries can fail from previously accepted rules.
   ```
   </details>
   
   ```suggestion
       keyword_pattern = re.compile(
           
r"\b(select|union|insert|update|delete|create|drop|alter|truncate|except|intersect|merge)\b",
           flags=re.IGNORECASE,
       )
   
       def _replace_var_tag(match):
           content = match.group(1)
           return "(SELECT 1)" if keyword_pattern.search(content) else 
"__JINJA_VAR__"
   
       def _replace_block_tag(match):
           content = match.group(1)
           return "(SELECT 1)" if keyword_pattern.search(content) else 
"__JINJA_VAR__"
   
       # Replace variable tags with a parser-safe marker
       sql = re.sub(r"\{\{(.*?)\}\}", _replace_var_tag, sql, flags=re.DOTALL)
       # Replace block tags with fail-closed markers when they contain SQL 
control words
       sql = re.sub(r"\{%(.*?)%\}", _replace_block_tag, sql, flags=re.DOTALL)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Set `ALLOW_ADHOC_SUBQUERY=False` (default at `superset/config.py:700`) 
and create an
   RLS rule through `POST` on the Row Level Security REST API
   (`superset/row_level_security/api.py:64`, `:201` calls
   `CreateRLSRuleCommand(item).run()`).
   
   2. Use a Jinja-only clause that contains SQL control words inside the Jinja 
payload (for
   example a Jinja expression/string containing `SELECT`), so validation reaches
   `validate_adhoc_subquery()` (`superset/models/helpers.py:193`, called by 
`create.py:60` /
   `update.py:75`).
   
   3. In validation, `strip_jinja()` (`superset/sql/parse.py:138`) replaces 
`{{...}}` with
   `__jinja__` and removes `{%...%}` (`parse.py:153-155`), so
   `parsed_statement.has_subquery()` at `helpers.py:247` does not see the 
hidden SQL and the
   rule is accepted.
   
   4. Later, when queries apply RLS, `_process_rls_clause()`
   (`superset/connectors/sqla/models.py:789`) runs template processing then 
validation again;
   this can now trip subquery checks and raise security validation errors at 
runtime instead
   of blocking at rule-creation time.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 152:155
   **Comment:**
        *Security: The Jinja stripping is fail-open: both `{{ ... }}` and `{% 
... %}` are blindly replaced/removed, so SQL control keywords embedded inside 
Jinja can bypass `has_subquery()` checks and therefore bypass the 
`ALLOW_ADHOC_SUBQUERY` security gate. Replace tags conditionally so 
keyword-bearing Jinja content is mapped to a synthetic subquery marker (for 
fail-closed detection), while non-SQL Jinja remains a harmless placeholder.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=e90235187f79f918190cf91314ad3f0127f80d25d08d744c8dd50b39949a9df2&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=e90235187f79f918190cf91314ad3f0127f80d25d08d744c8dd50b39949a9df2&reaction=dislike'>👎</a>



##########
tests/integration_tests/security/row_level_security_tests.py:
##########
@@ -263,12 +263,22 @@ def test_rls_filter_alters_gamma_birth_names_query(self):
         g.user = self.get_user(username="gamma")
         tbl = self.get_table(name="birth_names")
         sql = tbl.get_query_str(self.query_obj)
-
-        # establish that the filters are grouped together correctly with
-        # ANDs, ORs and parens in the correct place
-        assert (
-            "WHERE ((name like 'A%' or name like 'B%') OR (name like 'Q%')) 
AND (gender = 'boy');"  # noqa: E501
-            in sql
+        sql_normalized = " ".join(sql.lower().split())
+
+        # Establish that all filters are present in the query.
+        # Use regex to be agnostic of parenthesis nesting and order.
+        assert re.search(r"name\s+like\s+'a%'", sql_normalized, re.IGNORECASE)
+        assert re.search(r"name\s+like\s+'b%'", sql_normalized, re.IGNORECASE)
+        assert re.search(r"name\s+like\s+'q%'", sql_normalized, re.IGNORECASE)
+        assert re.search(r"gender\s*=\s*'boy'", sql_normalized, re.IGNORECASE)
+
+        # Verify AND/OR grouping semantics: the name filters must be 
OR-grouped,
+        # and the gender filter must be AND-ed with the name group (not OR-ed 
in).
+        assert re.search(
+            r"name\s+like\s+'a%'\s+or\s+name\s+like\s+'b%'", sql_normalized
+        ), "A% and B% name filters must be OR-grouped"
+        assert re.search(r"\band\b.*gender\s*=\s*'boy'", sql_normalized), (
+            "gender filter must be AND-connected to the name filter group"
         )

Review Comment:
   **Suggestion:** The new grouping checks are order-dependent even though SQL 
clause ordering is not guaranteed, so this test can fail intermittently when 
equivalent predicates are emitted in a different order (for example `gender = 
'boy'` appearing before the name group). Make the regex assertions 
order-agnostic so they validate semantics instead of string order. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ RLS integration test fails on equivalent predicate ordering.
   - ⚠️ CI can fail intermittently on harmless SQL differences.
   ```
   </details>
   
   ```suggestion
           assert re.search(
               
r"(name\s+like\s+'a%'\s+or\s+name\s+like\s+'b%')|(name\s+like\s+'b%'\s+or\s+name\s+like\s+'a%')",
               sql_normalized,
           ), "A% and B% name filters must be OR-grouped"
           assert re.search(
               r"(gender\s*=\s*'boy'.*\band\b)|(\\band\\b.*gender\s*=\s*'boy')",
               sql_normalized,
           ), "gender filter must be AND-connected to the name filter group"
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run `pytest
   
tests/integration_tests/security/row_level_security_tests.py::TestRowLevelSecurity::test_rls_filter_alters_gamma_birth_names_query`
   so execution enters `test_rls_filter_alters_gamma_birth_names_query` in
   `tests/integration_tests/security/row_level_security_tests.py:262`.
   
   2. The test calls `tbl.get_query_str(...)`, which flows through query 
building in
   `superset/models/helpers.py:2261` and applies RLS filters in
   `superset/models/helpers.py:3290-3322`.
   
   3. RLS filter order is sourced from `security_manager.get_rls_filters()` in
   `superset/security/manager.py:2763-2794`, where `query.all()` has no 
`order_by`;
   `_get_processed_rls_filters()` in 
`superset/connectors/sqla/models.py:805-837` preserves
   whatever order is returned.
   
   4. When SQL is rendered with semantically equivalent ordering (e.g., 
`gender='boy' AND
   (...)` instead of `(...) AND gender='boy'`), the current regex at
   `tests/integration_tests/security/row_level_security_tests.py:280-282` fails 
despite
   correct logic, creating flaky test failures.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/security/row_level_security_tests.py
   **Line:** 277:282
   **Comment:**
        *Logic Error: The new grouping checks are order-dependent even though 
SQL clause ordering is not guaranteed, so this test can fail intermittently 
when equivalent predicates are emitted in a different order (for example 
`gender = 'boy'` appearing before the name group). Make the regex assertions 
order-agnostic so they validate semantics instead of string order.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=dd32f9529c0292a431e2392b290caf9d85a947302dd7561d55ac735c95ef8a9d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=dd32f9529c0292a431e2392b290caf9d85a947302dd7561d55ac735c95ef8a9d&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to