codeant-ai-for-open-source[bot] commented on code in PR #40776:
URL: https://github.com/apache/superset/pull/40776#discussion_r3414664981
##########
superset/db_engine_specs/bigquery.py:
##########
@@ -742,6 +742,15 @@ def adjust_engine_params(
) -> tuple[URL, dict[str, Any]]:
if catalog:
uri = uri.set(host=catalog, database="")
+ if schema:
+ if not uri.host and uri.database:
+ # Triple-slash form (e.g., bigquery:///project): move the
project
+ # from database to host before setting the default dataset,
otherwise
+ # setting database would overwrite the project.
+ uri = uri.set(host=uri.database, database="")
+ # Setting database to schema enables the BigQuery default dataset
so
Review Comment:
**Suggestion:** This change enables per-query schema switching by rewriting
the SQLAlchemy URL database component, but `BigQueryEngineSpec` still does not
declare dynamic schema support. Superset's schema-resolution and permission
logic only uses `query.schema` when `supports_dynamic_schema` is true, so
queries can now run against a different dataset than the one used for
access-control checks. Mark the engine as supporting dynamic schemas (and keep
schema extraction behavior aligned) so authorization and query execution stay
consistent. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ SQL Lab RLS uses wrong BigQuery default dataset.
- ❌ Dataset-match checks authorize against different dataset than executed.
- ⚠️ Native chart queries may bypass intended BigQuery schema restrictions.
- ⚠️ Estimate command RLS diverges from execution for BigQuery.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a BigQuery database with a static default dataset in its
SQLAlchemy URI, for
example `bigquery://project/default_dataset`, so `database.sqlalchemy_uri`
encodes
`default_dataset` (BigQueryEngineSpec defined in
`superset/db_engine_specs/bigquery.py:130+`).
2. Execute a SQL Lab query against this BigQuery database selecting a
different
schema/dataset (e.g. `other_dataset`) in the UI, which populates
`Query.schema` and is
passed as the `schema` argument into `Database.get_sqla_engine`
(superset/models/core.py:517–523) and then into `Database._get_sqla_engine`
(superset/models/core.py:516–523).
3. Observe that `Database._get_sqla_engine` calls
`self.db_engine_spec.adjust_engine_params(uri=sqlalchemy_url,
connect_args=...,
catalog=catalog, schema=schema)` (superset/models/core.py:536–542), and
BigQueryEngineSpec.adjust_engine_params applies the new code path: when
`schema` is set it
rewrites the SQLAlchemy URL database component to `schema`
(superset/db_engine_specs/bigquery.py:737–754), so the BigQuery driver now
uses
`other_dataset` as the default dataset for unqualified tables at execution
time.
4. In parallel, Superset's security layer calls
`database.get_default_schema_for_query(query, template_params)` via
`SupersetSecurityManager.raise_for_access`
(superset/security/manager.py:3160–3239) and
via RLS in SQL Lab (superset/sql_lab.py:20–23,
superset/commands/sql_lab/estimate.py:14–39); this resolves to
`BaseEngineSpec.get_default_schema_for_query`
(superset/db_engine_specs/base.py:45–47),
which first checks `cls.supports_dynamic_schema` and, because
BigQueryEngineSpec does not
override it, falls through the non-dynamic branch that ignores
`query.schema`, reads
`database.sqlalchemy_uri` (static `default_dataset`), and calls
`cls.get_default_schema(...)` using the inspector.
5. As a result, default-schema resolution for access-control and RLS (driven
by
`default_schema` in `security/manager.py`, where tables are qualified with
`schema=default_schema` before dataset matching at lines 65–71) remains
bound to the
static `default_dataset` in the connection, while the actual query executes
in
`other_dataset` due to the adjusted engine params; this desynchronizes which
dataset
security checks see from where the query actually runs, and can let
unqualified table
references in `other_dataset` slip past schema-scoped permissions designed
for
`default_dataset`.
6. Marking BigQueryEngineSpec as `supports_dynamic_schema = True` (mirroring
other engines
like Snowflake and Presto that implement per-query schema changes in their
`adjust_engine_params`) would cause `get_default_schema_for_query` to use
`query.schema`
for BigQuery, aligning the default schema used for permission checks and RLS
with the
dataset actually selected by the per-query URL rewrite.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=63685a985ee44464b24433a6938434f3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=63685a985ee44464b24433a6938434f3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/bigquery.py
**Line:** 745:751
**Comment:**
*Security: This change enables per-query schema switching by rewriting
the SQLAlchemy URL database component, but `BigQueryEngineSpec` still does not
declare dynamic schema support. Superset's schema-resolution and permission
logic only uses `query.schema` when `supports_dynamic_schema` is true, so
queries can now run against a different dataset than the one used for
access-control checks. Mark the engine as supporting dynamic schemas (and keep
schema extraction behavior aligned) so authorization and query execution stay
consistent.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40776&comment_hash=04856396f5d034b3083bdc9d86b5ded674d6f1b087dc84c1717b55d411082f49&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40776&comment_hash=04856396f5d034b3083bdc9d86b5ded674d6f1b087dc84c1717b55d411082f49&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]