codeant-ai-for-open-source[bot] commented on code in PR #40662:
URL: https://github.com/apache/superset/pull/40662#discussion_r3384724467


##########
superset/commands/sql_lab/estimate.py:
##########
@@ -69,6 +77,58 @@ def validate(self) -> None:
             )
         security_manager.raise_for_access(database=self._database)
 
+    def _apply_sql_security(self, sql: str) -> str:
+        """Run the disallowed-function/table, DML and RLS controls against the
+        SQL to be estimated, mirroring ``sql_lab.execute_sql_statements``.
+
+        Returns the SQL with RLS predicates injected (when ``RLS_IN_SQLLAB`` is
+        enabled), so the cost estimate reflects the same constrained query the
+        user would actually be allowed to run.
+        """
+        db_engine_spec = self._database.db_engine_spec
+        parsed_script = SQLScript(sql, engine=db_engine_spec.engine)
+
+        disallowed_functions = app.config["DISALLOWED_SQL_FUNCTIONS"].get(
+            db_engine_spec.engine,
+            set(),
+        )
+        if disallowed_functions and parsed_script.check_functions_present(
+            disallowed_functions
+        ):
+            raise SupersetDisallowedSQLFunctionException(disallowed_functions)
+
+        disallowed_tables = app.config["DISALLOWED_SQL_TABLES"].get(
+            db_engine_spec.engine,
+            set(),
+        )
+        if disallowed_tables and 
parsed_script.check_tables_present(disallowed_tables):
+            found_tables = set()
+            for statement in parsed_script.statements:
+                present = {table.table.lower() for table in statement.tables}
+                for table in disallowed_tables:
+                    if table.lower() in present:
+                        found_tables.add(table)
+            raise SupersetDisallowedSQLTableException(found_tables or 
disallowed_tables)
+
+        if parsed_script.has_mutation() and not self._database.allow_dml:
+            raise SupersetDMLNotAllowedException()
+
+        if is_feature_enabled("RLS_IN_SQLLAB"):
+            # Resolve the default catalog/schema the same way the execution 
path
+            # does (``SQLExecutor._prepare_scripts`` /
+            # ``sql_lab.execute_sql_statements``) before injecting RLS. Without
+            # this, unqualified table references can't be matched against
+            # datasets registered under the database's default schema, so the
+            # estimate would skip predicates the real query enforces — breaking
+            # the security parity this command exists to provide.
+            catalog = self._catalog or self._database.get_default_catalog()
+            schema = self._schema or 
self._database.get_default_schema(catalog) or ""

Review Comment:
   **Suggestion:** Using `get_default_schema()` here bypasses engine-specific 
`get_default_schema_for_query()` security logic (for example Postgres rejecting 
`SET search_path` in query text). As a result, cost estimation can still 
execute/query-estimate SQL that changes effective schema while RLS is resolved 
against a different default schema, creating a remaining 
RLS-bypass/data-existence oracle path. Resolve schema using the same per-query 
path as execution (including query-text security checks) before applying RLS. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ `/sqllab/estimate/` bypasses Postgres search_path security checks.
   - ❌ RLS predicates resolved against wrong schema for estimation.
   - ❌ Cost estimates can reveal RLS-protected row cardinalities.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe the SQL Lab cost estimation API endpoint `estimate_query_cost` in
   `superset/sqllab/api.py:157-176`, which deserializes the request with
   `EstimateQueryCostSchema` from `superset/sqllab/schemas.py:37-52` and 
instantiates
   `QueryEstimationCommand(model)` from 
`superset/commands/sql_lab/estimate.py:52-66`, then
   calls `command.run()`.
   
   2. In `QueryEstimationCommand._apply_sql_security` at
   `superset/commands/sql_lab/estimate.py:80-130`, note that when 
`RLS_IN_SQLLAB` is enabled,
   the default catalog and schema used for RLS are resolved via:
   
      - `catalog = self._catalog or self._database.get_default_catalog()` 
(`estimate.py:124`)
   
      - `schema = self._schema or self._database.get_default_schema(catalog) or 
""`
      (`estimate.py:125`)
   
      This uses `Database.get_default_schema()` from 
`superset/models/core.py:17-21`, which
      in turn calls `db_engine_spec.get_default_schema(self, catalog)`, and 
**does not** call
      `Database.get_default_schema_for_query()`.
   
   3. Compare this to the execution path in `superset/sql_lab.py:420-499`: when 
executing SQL
   via `execute_sql_statements`, RLS application resolves the effective schema 
with the
   per-query method `default_schema = 
query.database.get_default_schema_for_query(query)` at
   `sql_lab.py:31-32` (in the shown chunk), and then applies RLS with
   `apply_rls(query.database, query.catalog, default_schema, statement)` at
   `sql_lab.py:32-33`. For Postgres, 
`PostgresEngineSpec.get_default_schema_for_query` in
   `superset/db_engine_specs/postgres.py:605-31` calls 
`process_jinja_sql(query.sql,
   database, template_params).script.get_settings()` and explicitly rejects any 
query that
   sets `search_path` by raising `SupersetSecurityException` if `"search_path" 
in settings`
   (`postgres.py:18-29`), providing a security gate against per-query schema 
manipulation.
   
   4. On a PostgreSQL database with RLS enabled (RLS predicates are collected in
   `superset/utils/rls.py:32-71` and matched to datasets via 
`get_predicates_for_table` at
   `rls.py:74-136` using the passed `schema`), send a SQL Lab cost-estimation 
request to
   `/sqllab/estimate/` (handled by `estimate_query_cost` in 
`superset/sqllab/api.py:157-176`)
   with a payload like:
   
      - `database_id` pointing to the Postgres database
   
      - `sql` containing `SET search_path = sensitive_schema; SELECT * FROM 
orders;`
   
      - `catalog` and `schema` omitted (so `self._catalog`/`self._schema` are 
`None`)
   
      During estimation:
   
      - `_apply_sql_security` uses `Database.get_default_schema(catalog)` 
instead of
      `get_default_schema_for_query`, so the Postgres-specific `search_path` 
check in
      `PostgresEngineSpec.get_default_schema_for_query` (`postgres.py:18-29`) 
is never
      invoked, and the `SET search_path` statement is not blocked.
   
      - RLS is applied using the static default schema (e.g. `'public'`) passed 
into
      `apply_rls(self._database, catalog, schema, statement)` at 
`estimate.py:126-127`, so
      `apply_rls` qualifies unqualified tables with this static schema
      (`Table.qualify(catalog, schema)` at `rls.py:56-57`) and looks up 
predicates for
      `catalog/static_schema/table` in `get_predicates_for_table` 
(`rls.py:89-103`).
   
      - The EXPLAIN/cost itself is executed by
      `db_engine_specs.base.BaseEngineSpec.estimate_query_cost`
      (`superset/db_engine_specs/base.py:2052-38`), which opens a real 
connection with
      `database.get_raw_connection(catalog=catalog, schema=self._schema, 
source=source)` at
      `base.py:25-28` and runs the SQL script including the `SET search_path` 
statement
      (`base.py:23-38`), so Postgres evaluates `SELECT * FROM orders` under the
      **search_path-modified** effective schema (e.g. 
`sensitive_schema.orders`).
   
      As a result, in this estimation path an attacker can:
   
      - Run SQL containing `SET search_path = sensitive_schema; SELECT ...` 
without hitting
      the `search_path` security check (since `get_default_schema_for_query` is 
bypassed).
   
      - Have RLS predicates resolved using the static default schema
      (`Database.get_default_schema`) instead of the query's effective schema, 
so RLS for
      `sensitive_schema.orders` can be skipped or mismatched while Postgres 
still estimates
      the query under the modified `search_path`.
   
      - Use the `/sqllab/estimate/` endpoint as a remaining RLS/data-existence 
oracle, even
      though the main execution path (`execute_sql_statements` in
      `superset/sql_lab.py:420-499`) would reject the same SQL via
      `get_default_schema_for_query` and/or apply RLS against the correct 
per-query schema.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d0c25469cc6e47ad97141754697013fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d0c25469cc6e47ad97141754697013fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/sql_lab/estimate.py
   **Line:** 124:125
   **Comment:**
        *Security: Using `get_default_schema()` here bypasses engine-specific 
`get_default_schema_for_query()` security logic (for example Postgres rejecting 
`SET search_path` in query text). As a result, cost estimation can still 
execute/query-estimate SQL that changes effective schema while RLS is resolved 
against a different default schema, creating a remaining 
RLS-bypass/data-existence oracle path. Resolve schema using the same per-query 
path as execution (including query-text security checks) before applying RLS.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40662&comment_hash=75d961aec7bf17a20c5eb072fc4e66508e57d0af529cd22244c7b87756384a32&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40662&comment_hash=75d961aec7bf17a20c5eb072fc4e66508e57d0af529cd22244c7b87756384a32&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to