codeant-ai-for-open-source[bot] commented on code in PR #40776:
URL: https://github.com/apache/superset/pull/40776#discussion_r3414664981


##########
superset/db_engine_specs/bigquery.py:
##########
@@ -742,6 +742,15 @@ def adjust_engine_params(
     ) -> tuple[URL, dict[str, Any]]:
         if catalog:
             uri = uri.set(host=catalog, database="")
+        if schema:
+            if not uri.host and uri.database:
+                # Triple-slash form (e.g., bigquery:///project): move the 
project
+                # from database to host before setting the default dataset, 
otherwise
+                # setting database would overwrite the project.
+                uri = uri.set(host=uri.database, database="")
+            # Setting database to schema enables the BigQuery default dataset 
so

Review Comment:
   **Suggestion:** This change enables per-query schema switching by rewriting 
the SQLAlchemy URL database component, but `BigQueryEngineSpec` still does not 
declare dynamic schema support. Superset's schema-resolution and permission 
logic only uses `query.schema` when `supports_dynamic_schema` is true, so 
queries can now run against a different dataset than the one used for 
access-control checks. Mark the engine as supporting dynamic schemas (and keep 
schema extraction behavior aligned) so authorization and query execution stay 
consistent. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ SQL Lab RLS uses wrong BigQuery default dataset.
   - ❌ Dataset-match checks authorize against different dataset than executed.
   - ⚠️ Native chart queries may bypass intended BigQuery schema restrictions.
   - ⚠️ Estimate command RLS diverges from execution for BigQuery.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a BigQuery database with a static default dataset in its 
SQLAlchemy URI, for
   example `bigquery://project/default_dataset`, so `database.sqlalchemy_uri` 
encodes
   `default_dataset` (BigQueryEngineSpec defined in
   `superset/db_engine_specs/bigquery.py:130+`).
   
   2. Execute a SQL Lab query against this BigQuery database selecting a 
different
   schema/dataset (e.g. `other_dataset`) in the UI, which populates 
`Query.schema` and is
   passed as the `schema` argument into `Database.get_sqla_engine`
   (superset/models/core.py:517–523) and then into `Database._get_sqla_engine`
   (superset/models/core.py:516–523).
   
   3. Observe that `Database._get_sqla_engine` calls
   `self.db_engine_spec.adjust_engine_params(uri=sqlalchemy_url, 
connect_args=...,
   catalog=catalog, schema=schema)` (superset/models/core.py:536–542), and
   BigQueryEngineSpec.adjust_engine_params applies the new code path: when 
`schema` is set it
   rewrites the SQLAlchemy URL database component to `schema`
   (superset/db_engine_specs/bigquery.py:737–754), so the BigQuery driver now 
uses
   `other_dataset` as the default dataset for unqualified tables at execution 
time.
   
   4. In parallel, Superset's security layer calls
   `database.get_default_schema_for_query(query, template_params)` via
   `SupersetSecurityManager.raise_for_access` 
(superset/security/manager.py:3160–3239) and
   via RLS in SQL Lab (superset/sql_lab.py:20–23,
   superset/commands/sql_lab/estimate.py:14–39); this resolves to
   `BaseEngineSpec.get_default_schema_for_query` 
(superset/db_engine_specs/base.py:45–47),
   which first checks `cls.supports_dynamic_schema` and, because 
BigQueryEngineSpec does not
   override it, falls through the non-dynamic branch that ignores 
`query.schema`, reads
   `database.sqlalchemy_uri` (static `default_dataset`), and calls
   `cls.get_default_schema(...)` using the inspector.
   
   5. As a result, default-schema resolution for access-control and RLS (driven 
by
   `default_schema` in `security/manager.py`, where tables are qualified with
   `schema=default_schema` before dataset matching at lines 65–71) remains 
bound to the
   static `default_dataset` in the connection, while the actual query executes 
in
   `other_dataset` due to the adjusted engine params; this desynchronizes which 
dataset
   security checks see from where the query actually runs, and can let 
unqualified table
   references in `other_dataset` slip past schema-scoped permissions designed 
for
   `default_dataset`.
   
   6. Marking BigQueryEngineSpec as `supports_dynamic_schema = True` (mirroring 
other engines
   like Snowflake and Presto that implement per-query schema changes in their
   `adjust_engine_params`) would cause `get_default_schema_for_query` to use 
`query.schema`
   for BigQuery, aligning the default schema used for permission checks and RLS 
with the
   dataset actually selected by the per-query URL rewrite.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=63685a985ee44464b24433a6938434f3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=63685a985ee44464b24433a6938434f3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/bigquery.py
   **Line:** 745:751
   **Comment:**
        *Security: This change enables per-query schema switching by rewriting 
the SQLAlchemy URL database component, but `BigQueryEngineSpec` still does not 
declare dynamic schema support. Superset's schema-resolution and permission 
logic only uses `query.schema` when `supports_dynamic_schema` is true, so 
queries can now run against a different dataset than the one used for 
access-control checks. Mark the engine as supporting dynamic schemas (and keep 
schema extraction behavior aligned) so authorization and query execution stay 
consistent.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40776&comment_hash=04856396f5d034b3083bdc9d86b5ded674d6f1b087dc84c1717b55d411082f49&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40776&comment_hash=04856396f5d034b3083bdc9d86b5ded674d6f1b087dc84c1717b55d411082f49&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to