dpgaspar commented on code in PR #38452:
URL: https://github.com/apache/superset/pull/38452#discussion_r2930929620
##########
superset/models/helpers.py:
##########
@@ -190,18 +193,33 @@ def validate_adhoc_subquery(
catalog: str | None,
default_schema: str,
engine: str,
+ is_predicate: bool = False,
) -> str:
"""
Check if adhoc SQL contains sub-queries or nested sub-queries with table.
If sub-queries are allowed, the adhoc SQL is modified to insert any
applicable RLS
predicates to it.
- :param sql: adhoc sql expression
+ :param sql: adhoc sql expression or predicate
+ :param is_predicate: if True, sql is treated as a predicate
+ (fragment of a WHERE clause).
:raise SupersetSecurityException if sql contains sub-queries or
nested sub-queries with table
"""
- parsed_statement = SQLStatement(sql, engine)
+ from superset.sql.parse import strip_jinja
+
+ # NEW: Strip Jinja before parsing to avoid syntax errors
+ # while maintaining visibility of the SQL structure.
+ clean_sql = strip_jinja(sql)
Review Comment:
The function strips Jinja (clean_sql = strip_jinja(sql)) but then does
return parsed_statement.format() — returning the Jinja-stripped SQL, not the
original. Every existing caller of
validate_adhoc_subquery that uses the return value (e.g.,
_process_sql_expression in helpers.py) will now get back SQL with __JINJA_VAR__
placeholders instead of the original Jinja templates. This
would break template processing downstream.
The RLS create/update commands don't use the return value, so this only
matters for the existing non-is_predicate callers. But it's a regression for
those paths.
##########
superset/connectors/sqla/models.py:
##########
@@ -755,13 +755,30 @@ def get_sqla_row_level_filters(
"""
template_processor = template_processor or
self.get_template_processor()
+ def _get_processed_rls_clause(clause: str) -> TextClause:
+ processed = None
+ if hasattr(self, "_process_select_expression"):
Review Comment:
- _process_select_expression is defined on ExploreMixin which SqlaTable
inherits — hasattr should always be True in practice, making it dead-code
guarding. If the intent is to handle subclasses that
don't have it, an explicit protocol/ABC check would be clearer.
- If _process_select_expression returns an empty string (which it can — it
returns expression.strip() which could be ""), the falsy check if not processed
would fall through to the template_processor
path, silently bypassing the security validation that
_process_select_expression was supposed to perform. It should check if
processed is None instead of if not processed.
##########
superset/models/helpers.py:
##########
@@ -190,18 +193,33 @@ def validate_adhoc_subquery(
catalog: str | None,
default_schema: str,
engine: str,
+ is_predicate: bool = False,
) -> str:
"""
Check if adhoc SQL contains sub-queries or nested sub-queries with table.
If sub-queries are allowed, the adhoc SQL is modified to insert any
applicable RLS
predicates to it.
- :param sql: adhoc sql expression
+ :param sql: adhoc sql expression or predicate
+ :param is_predicate: if True, sql is treated as a predicate
+ (fragment of a WHERE clause).
:raise SupersetSecurityException if sql contains sub-queries or
nested sub-queries with table
"""
- parsed_statement = SQLStatement(sql, engine)
+ from superset.sql.parse import strip_jinja
+
+ # NEW: Strip Jinja before parsing to avoid syntax errors
+ # while maintaining visibility of the SQL structure.
+ clean_sql = strip_jinja(sql)
+
+ if is_predicate:
+ # Use specialized predicate parsing for SQL fragments
+ # This is more robust for RLS clauses like 'column = value'
+ parsed_statement = SQLStatement(f"SELECT * WHERE {clean_sql}", engine)
Review Comment:
If someone passes a clause like 1=1; DROP TABLE users, the SELECT * WHERE
1=1; DROP TABLE users would be parsed as a multi-statement SQL, and
SQLStatement only parses the first statement. The DROP
TABLE would not be detected by has_subquery() or is_mutating().
##########
superset/models/helpers.py:
##########
@@ -213,7 +231,14 @@ def validate_adhoc_subquery(
)
# enforce RLS rules in any relevant tables
- apply_rls(database, catalog, default_schema, parsed_statement)
+ # Use ContextVar to surgically break infinite recursion loops
+ depth = RLS_RECURSION_DEPTH.get()
Review Comment:
This means apply_rls is only called at depth 0, and never at depth 1+. While
this prevents infinite recursion, it also means that if an RLS rule on table A
references table B which also has RLS rules,
table B's RLS will be silently skipped. This is a security concern —
nested RLS would be unenforced. The comment says "surgically break infinite
recursion" but doesn't explain why depth 1 is the right
cutoff vs. e.g. 2 or 3.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]