Re: [PR] refactor(sql_parse): enhance AST validation for subqueries and mutation checks [superset]

via GitHub Fri, 13 Mar 2026 05:38:20 -0700


dpgaspar commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2930929620



##########
superset/models/helpers.py:
##########
@@ -190,18 +193,33 @@ def validate_adhoc_subquery(
     catalog: str | None,
     default_schema: str,
     engine: str,
+    is_predicate: bool = False,
 ) -> str:
     """
     Check if adhoc SQL contains sub-queries or nested sub-queries with table.
 
     If sub-queries are allowed, the adhoc SQL is modified to insert any 
applicable RLS
     predicates to it.
 
-    :param sql: adhoc sql expression
+    :param sql: adhoc sql expression or predicate
+    :param is_predicate: if True, sql is treated as a predicate
+        (fragment of a WHERE clause).
     :raise SupersetSecurityException if sql contains sub-queries or
     nested sub-queries with table
     """
-    parsed_statement = SQLStatement(sql, engine)
+    from superset.sql.parse import strip_jinja
+
+    # NEW: Strip Jinja before parsing to avoid syntax errors
+    # while maintaining visibility of the SQL structure.
+    clean_sql = strip_jinja(sql)

Review Comment:
   The function strips Jinja (clean_sql = strip_jinja(sql)) but then does 
return parsed_statement.format() — returning the Jinja-stripped SQL, not the 
original. Every existing caller of
     validate_adhoc_subquery that uses the return value (e.g., 
_process_sql_expression in helpers.py) will now get back SQL with __JINJA_VAR__ 
placeholders instead of the original Jinja templates. This
     would break template processing downstream.
   
   The RLS create/update commands don't use the return value, so this only 
matters for the existing non-is_predicate callers. But it's a regression for 
those paths.
   
   



##########
superset/connectors/sqla/models.py:
##########
@@ -755,13 +755,30 @@ def get_sqla_row_level_filters(
         """
         template_processor = template_processor or 
self.get_template_processor()
 
+        def _get_processed_rls_clause(clause: str) -> TextClause:
+            processed = None
+            if hasattr(self, "_process_select_expression"):

Review Comment:
   - _process_select_expression is defined on ExploreMixin which SqlaTable 
inherits — hasattr should always be True in practice, making it dead-code 
guarding. If the intent is to handle subclasses that
     don't have it, an explicit protocol/ABC check would be clearer.
   - If _process_select_expression returns an empty string (which it can — it 
returns expression.strip() which could be ""), the falsy check if not processed 
would fall through to the template_processor
   path, silently bypassing the security validation that 
_process_select_expression was supposed to perform. It should check if 
processed is None instead of if not processed.



##########
superset/models/helpers.py:
##########
@@ -190,18 +193,33 @@ def validate_adhoc_subquery(
     catalog: str | None,
     default_schema: str,
     engine: str,
+    is_predicate: bool = False,
 ) -> str:
     """
     Check if adhoc SQL contains sub-queries or nested sub-queries with table.
 
     If sub-queries are allowed, the adhoc SQL is modified to insert any 
applicable RLS
     predicates to it.
 
-    :param sql: adhoc sql expression
+    :param sql: adhoc sql expression or predicate
+    :param is_predicate: if True, sql is treated as a predicate
+        (fragment of a WHERE clause).
     :raise SupersetSecurityException if sql contains sub-queries or
     nested sub-queries with table
     """
-    parsed_statement = SQLStatement(sql, engine)
+    from superset.sql.parse import strip_jinja
+
+    # NEW: Strip Jinja before parsing to avoid syntax errors
+    # while maintaining visibility of the SQL structure.
+    clean_sql = strip_jinja(sql)
+
+    if is_predicate:
+        # Use specialized predicate parsing for SQL fragments
+        # This is more robust for RLS clauses like 'column = value'
+        parsed_statement = SQLStatement(f"SELECT * WHERE {clean_sql}", engine)

Review Comment:
   If someone passes a clause like 1=1; DROP TABLE users, the SELECT * WHERE 
1=1; DROP TABLE users would be parsed as a multi-statement SQL, and 
SQLStatement only parses the first statement. The DROP
     TABLE would not be detected by has_subquery() or is_mutating().
   
   



##########
superset/models/helpers.py:
##########
@@ -213,7 +231,14 @@ def validate_adhoc_subquery(
             )
 
         # enforce RLS rules in any relevant tables
-        apply_rls(database, catalog, default_schema, parsed_statement)
+        # Use ContextVar to surgically break infinite recursion loops
+        depth = RLS_RECURSION_DEPTH.get()

Review Comment:
   This means apply_rls is only called at depth 0, and never at depth 1+. While 
this prevents infinite recursion, it also means that if an RLS rule on table A 
references table B which also has RLS rules,
     table B's RLS will be silently skipped. This is a security concern — 
nested RLS would be unenforced. The comment says "surgically break infinite 
recursion" but doesn't explain why depth 1 is the right
     cutoff vs. e.g. 2 or 3.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor(sql_parse): enhance AST validation for subqueries and mutation checks [superset]

Reply via email to