codeant-ai-for-open-source[bot] commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2929450581


##########
superset/sql/parse.py:
##########
@@ -135,6 +135,33 @@ class CTASMethod(enum.Enum):
     VIEW = enum.auto()
 
 
+def strip_jinja(sql: str) -> str:
+    """
+    Replaces Jinja tags with parser-safe placeholders.
+
+    If a tag contains SQL keywords that could indicate a subquery or mutation,
+    it is replaced with a synthetic subquery to ensure it is flagged by
+    security checks. Otherwise, it is replaced with a dummy identifier.
+    """
+    sql_keyword_pattern = re.compile(
+        
r"\b(select|union|except|intersect|insert|update|delete|merge|create|drop|alter|truncate)\b",
+        flags=re.IGNORECASE,
+    )
+
+    def _replace(match: re.Match[str]) -> str:
+        body = match.group(1)
+        if sql_keyword_pattern.search(body):
+            return "(SELECT 1)"
+        return "__JINJA_VAR__"
+
+    # Replace comments {# ... #} with empty string
+    sql = re.sub(r"\{#.*?#\}", "", sql, flags=re.DOTALL)
+    # Replace {{ ... }} and {% ... %} using the keyword-aware replacer
+    sql = re.sub(r"\{\{(.*?)\}\}", _replace, sql, flags=re.DOTALL)
+    sql = re.sub(r"\{%(.*?)%\}", _replace, sql, flags=re.DOTALL)

Review Comment:
   **Suggestion:** Replacing non-keyword `{% ... %}` control tags with 
`__JINJA_VAR__` injects invalid tokens into SQL (for example around `{% if %}` 
/ `{% endif %}`), which can make valid templated predicates fail parsing and 
break RLS validation. For statement tags, replace non-keyword blocks with an 
empty string instead of an identifier. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ RLS create endpoint rejects valid Jinja control clauses.
   - ❌ RLS update endpoint fails via same validator path.
   - ⚠️ Admin policy authoring workflow breaks with server errors.
   ```
   </details>
   
   ```suggestion
       sql = re.sub(
           r"\{%(.*?)%\}",
           lambda match: "(SELECT 1)"
           if sql_keyword_pattern.search(match.group(1))
           else "",
           sql,
           flags=re.DOTALL,
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Submit RLS create request to `POST /api/v1/rowlevelsecurity/`
   (`superset/row_level_security/api.py:151-202`) with a clause using control 
tags, e.g. `{%
   if current_user_id() %} owner_id = {{ current_user_id() }} {% endif %}`.
   
   2. Request reaches `CreateRLSRuleCommand.validate()`
   (`superset/commands/security/create.py:45-67`), which calls 
`validate_adhoc_subquery(...,
   is_predicate=True)` at line 60.
   
   3. `validate_adhoc_subquery()` calls `strip_jinja()` 
(`superset/models/helpers.py:41-50`);
   `strip_jinja` currently replaces `{% ... %}` with `__JINJA_VAR__`
   (`superset/sql/parse.py:161`), creating non-SQL tokens around control-flow 
blocks.
   
   4. Predicate parsing then builds `SQLStatement(f"SELECT * WHERE 
{clean_sql}", engine)`
   (`superset/models/helpers.py:50`), and parser fails on injected tokens; 
exception
   propagates because `RLSRestApi.post()` does not catch parse/security 
exceptions (it only
   handles role/datasource/SQLAlchemy errors at
   `superset/row_level_security/api.py:203-226`), so rule save fails.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 161:161
   **Comment:**
        *Possible Bug: Replacing non-keyword `{% ... %}` control tags with 
`__JINJA_VAR__` injects invalid tokens into SQL (for example around `{% if %}` 
/ `{% endif %}`), which can make valid templated predicates fail parsing and 
break RLS validation. For statement tags, replace non-keyword blocks with an 
empty string instead of an identifier.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=2615523eb6e70313d627e8c53c1c0fdba6ce563dae665a211e879811d3f3c385&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38452&comment_hash=2615523eb6e70313d627e8c53c1c0fdba6ce563dae665a211e879811d3f3c385&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to