korbit-ai[bot] commented on code in PR #36061:
URL: https://github.com/apache/superset/pull/36061#discussion_r2518503946
##########
superset/models/helpers.py:
##########
@@ -1196,6 +1199,24 @@ def get_from_clause(
_("Virtual dataset query must be read-only")
)
+ # Apply RLS filters to virtual dataset SQL to prevent RLS bypass
+ # For each table referenced in the virtual dataset, apply its RLS
filters
+ if parsed_script.statements:
+ default_schema = self.database.get_default_schema(self.catalog)
+ try:
+ for statement in parsed_script.statements:
+ apply_rls(
+ self.database,
+ self.catalog,
+ self.schema or default_schema or "",
+ statement,
+ )
Review Comment:
### Missing RLS computation caching <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
RLS filters are applied to each SQL statement in a loop without any caching
mechanism, potentially causing expensive repeated computations for the same
database/schema combinations.
###### Why this matters
This could lead to significant performance degradation when processing
virtual datasets with multiple statements, as RLS computation may involve
database queries or complex rule evaluations that are repeated unnecessarily.
###### Suggested change ∙ *Feature Preview*
Implement caching for RLS filter results based on database, catalog, and
schema combinations. Consider using a method-level cache or storing computed
RLS filters in a dictionary keyed by (database_id, catalog, schema) to avoid
recomputation for identical contexts within the same request.
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/347e3b40-a487-42b7-aed3-004edafca4ba/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/347e3b40-a487-42b7-aed3-004edafca4ba?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/347e3b40-a487-42b7-aed3-004edafca4ba?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/347e3b40-a487-42b7-aed3-004edafca4ba?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/347e3b40-a487-42b7-aed3-004edafca4ba)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:f1df160b-29c2-40d4-9c76-c8ce07be33a2 -->
[](f1df160b-29c2-40d4-9c76-c8ce07be33a2)
##########
superset/utils/rls.py:
##########
@@ -113,3 +113,51 @@ def get_predicates_for_table(
)
for predicate in dataset.get_sqla_row_level_filters()
]
+
+
+def collect_rls_predicates_for_sql(
+ sql: str,
+ database: Database,
+ catalog: str | None,
+ schema: str,
+) -> list[str]:
+ """
+ Collect all RLS predicates that would be applied to tables in the given
SQL.
+
+ This is used for cache key generation for virtual datasets to ensure that
+ different users with different RLS rules get different cache keys.
+
+ :param sql: The SQL query to analyze
+ :param database: The database the query runs against
+ :param catalog: The default catalog for the query
+ :param schema: The default schema for the query
+ :return: List of RLS predicate strings that would be applied
+ """
+ from superset.sql.parse import SQLScript
+
+ try:
+ parsed_script = SQLScript(sql, engine=database.db_engine_spec.engine)
+ all_predicates: list[str] = []
+
+ for statement in parsed_script.statements:
+ for table in statement.tables:
+ # fully qualify table
+ qualified_table = Table(
+ table.table,
+ table.schema or schema,
+ table.catalog or catalog,
+ )
+
+ predicates = get_predicates_for_table(
+ qualified_table,
+ database,
+ database.get_default_catalog(),
+ )
+ all_predicates.extend(predicates)
Review Comment:
### Redundant database calls for duplicate tables <sub></sub>
<details>
<summary>Tell me more</summary>
###### What is the issue?
The function makes redundant database calls for duplicate tables across
statements, as it doesn't deduplicate tables before calling
get_predicates_for_table.
###### Why this matters
When the same table appears in multiple statements, the expensive
get_predicates_for_table function (which performs database queries) will be
called multiple times for the same table, causing unnecessary database overhead
and slower performance.
###### Suggested change ∙ *Feature Preview*
Collect all unique qualified tables first, then iterate through them once to
get predicates:
```python
# Collect all unique qualified tables first
unique_tables = set()
for statement in parsed_script.statements:
for table in statement.tables:
qualified_table = Table(
table.table,
table.schema or schema,
table.catalog or catalog,
)
unique_tables.add(qualified_table)
# Get predicates for each unique table
all_predicates: list[str] = []
for qualified_table in unique_tables:
predicates = get_predicates_for_table(
qualified_table,
database,
database.get_default_catalog(),
)
all_predicates.extend(predicates)
```
###### Provide feedback to improve future suggestions
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a06a6159-1d39-4fa9-bb54-9e6134cceb0d/upvote)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a06a6159-1d39-4fa9-bb54-9e6134cceb0d?what_not_true=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a06a6159-1d39-4fa9-bb54-9e6134cceb0d?what_out_of_scope=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a06a6159-1d39-4fa9-bb54-9e6134cceb0d?what_not_in_standard=true)
[](https://app.korbit.ai/feedback/aa91ff46-6083-4491-9416-b83dd1994b51/a06a6159-1d39-4fa9-bb54-9e6134cceb0d)
</details>
<sub>
💬 Looking for more details? Reply to this comment to chat with Korbit.
</sub>
<!--- korbi internal id:9b4d5a50-87de-4fec-a91d-66bc4d398160 -->
[](9b4d5a50-87de-4fec-a91d-66bc4d398160)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]