codeant-ai-for-open-source[bot] commented on code in PR #40662:
URL: https://github.com/apache/superset/pull/40662#discussion_r3384724467
##########
superset/commands/sql_lab/estimate.py:
##########
@@ -69,6 +77,58 @@ def validate(self) -> None:
)
security_manager.raise_for_access(database=self._database)
+ def _apply_sql_security(self, sql: str) -> str:
+ """Run the disallowed-function/table, DML and RLS controls against the
+ SQL to be estimated, mirroring ``sql_lab.execute_sql_statements``.
+
+ Returns the SQL with RLS predicates injected (when ``RLS_IN_SQLLAB`` is
+ enabled), so the cost estimate reflects the same constrained query the
+ user would actually be allowed to run.
+ """
+ db_engine_spec = self._database.db_engine_spec
+ parsed_script = SQLScript(sql, engine=db_engine_spec.engine)
+
+ disallowed_functions = app.config["DISALLOWED_SQL_FUNCTIONS"].get(
+ db_engine_spec.engine,
+ set(),
+ )
+ if disallowed_functions and parsed_script.check_functions_present(
+ disallowed_functions
+ ):
+ raise SupersetDisallowedSQLFunctionException(disallowed_functions)
+
+ disallowed_tables = app.config["DISALLOWED_SQL_TABLES"].get(
+ db_engine_spec.engine,
+ set(),
+ )
+ if disallowed_tables and
parsed_script.check_tables_present(disallowed_tables):
+ found_tables = set()
+ for statement in parsed_script.statements:
+ present = {table.table.lower() for table in statement.tables}
+ for table in disallowed_tables:
+ if table.lower() in present:
+ found_tables.add(table)
+ raise SupersetDisallowedSQLTableException(found_tables or
disallowed_tables)
+
+ if parsed_script.has_mutation() and not self._database.allow_dml:
+ raise SupersetDMLNotAllowedException()
+
+ if is_feature_enabled("RLS_IN_SQLLAB"):
+ # Resolve the default catalog/schema the same way the execution
path
+ # does (``SQLExecutor._prepare_scripts`` /
+ # ``sql_lab.execute_sql_statements``) before injecting RLS. Without
+ # this, unqualified table references can't be matched against
+ # datasets registered under the database's default schema, so the
+ # estimate would skip predicates the real query enforces — breaking
+ # the security parity this command exists to provide.
+ catalog = self._catalog or self._database.get_default_catalog()
+ schema = self._schema or
self._database.get_default_schema(catalog) or ""
Review Comment:
**Suggestion:** Using `get_default_schema()` here bypasses engine-specific
`get_default_schema_for_query()` security logic (for example Postgres rejecting
`SET search_path` in query text). As a result, cost estimation can still
execute/query-estimate SQL that changes effective schema while RLS is resolved
against a different default schema, creating a remaining
RLS-bypass/data-existence oracle path. Resolve schema using the same per-query
path as execution (including query-text security checks) before applying RLS.
[security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ `/sqllab/estimate/` bypasses Postgres search_path security checks.
- ❌ RLS predicates resolved against wrong schema for estimation.
- ❌ Cost estimates can reveal RLS-protected row cardinalities.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Observe the SQL Lab cost estimation API endpoint `estimate_query_cost` in
`superset/sqllab/api.py:157-176`, which deserializes the request with
`EstimateQueryCostSchema` from `superset/sqllab/schemas.py:37-52` and
instantiates
`QueryEstimationCommand(model)` from
`superset/commands/sql_lab/estimate.py:52-66`, then
calls `command.run()`.
2. In `QueryEstimationCommand._apply_sql_security` at
`superset/commands/sql_lab/estimate.py:80-130`, note that when
`RLS_IN_SQLLAB` is enabled,
the default catalog and schema used for RLS are resolved via:
- `catalog = self._catalog or self._database.get_default_catalog()`
(`estimate.py:124`)
- `schema = self._schema or self._database.get_default_schema(catalog) or
""`
(`estimate.py:125`)
This uses `Database.get_default_schema()` from
`superset/models/core.py:17-21`, which
in turn calls `db_engine_spec.get_default_schema(self, catalog)`, and
**does not** call
`Database.get_default_schema_for_query()`.
3. Compare this to the execution path in `superset/sql_lab.py:420-499`: when
executing SQL
via `execute_sql_statements`, RLS application resolves the effective schema
with the
per-query method `default_schema =
query.database.get_default_schema_for_query(query)` at
`sql_lab.py:31-32` (in the shown chunk), and then applies RLS with
`apply_rls(query.database, query.catalog, default_schema, statement)` at
`sql_lab.py:32-33`. For Postgres,
`PostgresEngineSpec.get_default_schema_for_query` in
`superset/db_engine_specs/postgres.py:605-31` calls
`process_jinja_sql(query.sql,
database, template_params).script.get_settings()` and explicitly rejects any
query that
sets `search_path` by raising `SupersetSecurityException` if `"search_path"
in settings`
(`postgres.py:18-29`), providing a security gate against per-query schema
manipulation.
4. On a PostgreSQL database with RLS enabled (RLS predicates are collected in
`superset/utils/rls.py:32-71` and matched to datasets via
`get_predicates_for_table` at
`rls.py:74-136` using the passed `schema`), send a SQL Lab cost-estimation
request to
`/sqllab/estimate/` (handled by `estimate_query_cost` in
`superset/sqllab/api.py:157-176`)
with a payload like:
- `database_id` pointing to the Postgres database
- `sql` containing `SET search_path = sensitive_schema; SELECT * FROM
orders;`
- `catalog` and `schema` omitted (so `self._catalog`/`self._schema` are
`None`)
During estimation:
- `_apply_sql_security` uses `Database.get_default_schema(catalog)`
instead of
`get_default_schema_for_query`, so the Postgres-specific `search_path`
check in
`PostgresEngineSpec.get_default_schema_for_query` (`postgres.py:18-29`)
is never
invoked, and the `SET search_path` statement is not blocked.
- RLS is applied using the static default schema (e.g. `'public'`) passed
into
`apply_rls(self._database, catalog, schema, statement)` at
`estimate.py:126-127`, so
`apply_rls` qualifies unqualified tables with this static schema
(`Table.qualify(catalog, schema)` at `rls.py:56-57`) and looks up
predicates for
`catalog/static_schema/table` in `get_predicates_for_table`
(`rls.py:89-103`).
- The EXPLAIN/cost itself is executed by
`db_engine_specs.base.BaseEngineSpec.estimate_query_cost`
(`superset/db_engine_specs/base.py:2052-38`), which opens a real
connection with
`database.get_raw_connection(catalog=catalog, schema=self._schema,
source=source)` at
`base.py:25-28` and runs the SQL script including the `SET search_path`
statement
(`base.py:23-38`), so Postgres evaluates `SELECT * FROM orders` under the
**search_path-modified** effective schema (e.g.
`sensitive_schema.orders`).
As a result, in this estimation path an attacker can:
- Run SQL containing `SET search_path = sensitive_schema; SELECT ...`
without hitting
the `search_path` security check (since `get_default_schema_for_query` is
bypassed).
- Have RLS predicates resolved using the static default schema
(`Database.get_default_schema`) instead of the query's effective schema,
so RLS for
`sensitive_schema.orders` can be skipped or mismatched while Postgres
still estimates
the query under the modified `search_path`.
- Use the `/sqllab/estimate/` endpoint as a remaining RLS/data-existence
oracle, even
though the main execution path (`execute_sql_statements` in
`superset/sql_lab.py:420-499`) would reject the same SQL via
`get_default_schema_for_query` and/or apply RLS against the correct
per-query schema.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=d0c25469cc6e47ad97141754697013fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=d0c25469cc6e47ad97141754697013fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/sql_lab/estimate.py
**Line:** 124:125
**Comment:**
*Security: Using `get_default_schema()` here bypasses engine-specific
`get_default_schema_for_query()` security logic (for example Postgres rejecting
`SET search_path` in query text). As a result, cost estimation can still
execute/query-estimate SQL that changes effective schema while RLS is resolved
against a different default schema, creating a remaining
RLS-bypass/data-existence oracle path. Resolve schema using the same per-query
path as execution (including query-text security checks) before applying RLS.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40662&comment_hash=75d961aec7bf17a20c5eb072fc4e66508e57d0af529cd22244c7b87756384a32&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40662&comment_hash=75d961aec7bf17a20c5eb072fc4e66508e57d0af529cd22244c7b87756384a32&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]