codeant-ai-for-open-source[bot] commented on code in PR #34619:
URL: https://github.com/apache/superset/pull/34619#discussion_r3464440875


##########
superset/db_engine_specs/databricks.py:
##########
@@ -277,6 +283,71 @@ class 
DatabricksDynamicBaseEngineSpec(BasicParametersMixin, DatabricksBaseEngine
         "port": "port",
     }
 
+    # OAuth2 endpoints for different cloud providers
+    _oauth2_endpoints = {
+        "aws": {
+            "authorization_request_uri": 
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/authorize";,
+            "token_request_uri": 
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/token";,
+        },
+        "azure": {
+            "authorization_request_uri": 
"https://login.microsoftonline.com/{}/oauth2/v2.0/authorize";,
+            "token_request_uri": 
"https://login.microsoftonline.com/{}/oauth2/v2.0/token";,
+        },
+        "gcp": {
+            "authorization_request_uri": 
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/authorize";,
+            "token_request_uri": 
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/token";,
+        },
+    }

Review Comment:
   **Suggestion:** The OAuth endpoint templates contain `{}` placeholders but 
no formatting step ever fills them with account/tenant identifiers. This 
generates invalid authorization/token URLs (for example paths with empty or 
literal placeholder segments), causing OAuth requests to fail against real 
providers. Build concrete URIs from required config values before sending users 
to OAuth endpoints. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Databricks OAuth2 login redirects to invalid authorization URLs.
   - ⚠️ Users cannot complete OAuth2 authentication for Databricks.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure an OAuth2 client for Databricks (either globally in 
`DATABASE_OAUTH2_CLIENTS`
   or per-database in `encrypted_extra["oauth2_client_info"]`) so that
   `Database.get_oauth2_config()` returns a valid `OAuth2ClientConfig` (see
   `superset/models/core.py:33-43` and 
`superset/db_engine_specs/base.py:725-175`).
   
   2. Trigger OAuth2 for a Databricks database (native or Python connector) 
without an
   existing token, so `Database.start_oauth2_dance()` calls
   `db_engine_spec.start_oauth2_dance(self)`; 
`BaseEngineSpec.start_oauth2_dance` at
   `superset/db_engine_specs/base.py:643-143` then calls
   `cls.get_oauth2_authorization_uri(oauth2_config, state, code_verifier=...)`.
   
   3. In `DatabricksNativeEngineSpec.get_oauth2_authorization_uri` or
   `DatabricksPythonConnectorEngineSpec.get_oauth2_authorization_uri`
   (`superset/db_engine_specs/databricks.py:158-188` and `435-465`), the method 
detects the
   cloud provider via `_detect_cloud_provider` and overwrites
   `config["authorization_request_uri"]` with the corresponding template from
   `_oauth2_endpoints` defined at 
`superset/db_engine_specs/databricks.py:287-300`, such as
   `"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/authorize"`, 
without
   substituting the `{}` placeholder.
   
   4. `BaseEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/base.py:178-206` then uses this unformatted 
template as `uri`
   and appends query parameters, producing an authorization URL whose path 
still contains
   `"{}"` instead of an account/tenant identifier, leading the identity provider
   (Databricks/Azure/GCP) to reject or 404 the request when the browser is 
redirected.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4294c1fc50894e018e6fc57af0d88357&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4294c1fc50894e018e6fc57af0d88357&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 287:300
   **Comment:**
        *Api Mismatch: The OAuth endpoint templates contain `{}` placeholders 
but no formatting step ever fills them with account/tenant identifiers. This 
generates invalid authorization/token URLs (for example paths with empty or 
literal placeholder segments), causing OAuth requests to fail against real 
providers. Build concrete URIs from required config values before sending users 
to OAuth endpoints.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=5bc1116b8f5dd2e55b5657c267c6af50d66fbfc7b095d43a8c5d2506d86d50b7&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=5bc1116b8f5dd2e55b5657c267c6af50d66fbfc7b095d43a8c5d2506d86d50b7&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +545,72 @@ class 
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
     supports_dynamic_catalog = True
     supports_cross_catalog_queries = True
 
+    # OAuth 2.0 support
+    supports_oauth2 = True
+    oauth2_exception = OAuth2RedirectError
+    oauth2_scope = "sql"

Review Comment:
   **Suggestion:** OAuth2 is enabled without enforcing per-user query caching 
for this engine. Since authentication becomes user-token based, shared cache 
entries can be reused across users unless per-user caching is forced, which can 
leak one user's results to another. Add backend enforcement so OAuth2-enabled 
Databricks connections always use per-user cache keys. [cache]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Databricks OAuth2 query cache entries may leak between users.
   - ⚠️ Cache behavior conflicts with per-user auth expectations.
   - ⚠️ Security posture weaker than required for OAuth2 engines.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe that `DatabricksNativeEngineSpec` and 
`DatabricksPythonConnectorEngineSpec`
   declare `supports_oauth2 = True` and define `impersonate_user` to inject a 
per-user OAuth2
   access token into the SQLAlchemy URL/connect_args at
   `superset/db_engine_specs/databricks.py:328-349` and `116-139`, while
   `Database._get_sqla_engine` in `superset/models/core.py:28-49` calls
   `self.db_engine_spec.impersonate_user(self, effective_username, access_token,
   sqlalchemy_url, engine_kwargs)` when `self.impersonate_user` is true.
   
   2. Query caching keys are constructed in `viz.BaseViz.cache_key` at
   `superset/viz.py:504-513`, which calls
   `add_impersonation_cache_key_if_needed(self.datasource.database, 
cache_dict)` before
   hashing, so per-user cache separation only happens if that helper adds an
   `"impersonation_key"`.
   
   3. `add_impersonation_cache_key_if_needed` in 
`superset/utils/cache_keys.py:33-25` adds an
   impersonation-based cache key only when one of the following holds: (a) 
feature flag
   `CACHE_IMPERSONATION` is enabled and `database.impersonate_user` is true, 
(b) feature flag
   `CACHE_QUERY_BY_USER` is enabled, or (c) `database.extra` JSON sets 
`"per_user_caching":
   true`; it does not automatically enforce per-user cache keys for 
OAuth2-enabled engines.
   
   4. In a Databricks OAuth2 configuration where `supports_oauth2` is true, the 
connection
   uses per-user tokens via `impersonate_user`, but the admin has neither 
enabled
   `per_user_caching` nor the relevant feature flags, different users issuing 
the same query
   against the same Databricks database will get identical cache keys and share 
cached
   results, potentially exposing one user's query results to another despite 
authentication
   being per-user.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=804b505714244070a01454a43cb7f74b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=804b505714244070a01454a43cb7f74b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 549:551
   **Comment:**
        *Cache: OAuth2 is enabled without enforcing per-user query caching for 
this engine. Since authentication becomes user-token based, shared cache 
entries can be reused across users unless per-user caching is forced, which can 
leak one user's results to another. Add backend enforcement so OAuth2-enabled 
Databricks connections always use per-user cache keys.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=66e40c18f0f257709a32d9f69959118454711b1d2b06132fd71e0ebcefef581d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=66e40c18f0f257709a32d9f69959118454711b1d2b06132fd71e0ebcefef581d&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +545,72 @@ class 
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
     supports_dynamic_catalog = True
     supports_cross_catalog_queries = True
 
+    # OAuth 2.0 support
+    supports_oauth2 = True
+    oauth2_exception = OAuth2RedirectError
+    oauth2_scope = "sql"
+
+    # OAuth2 endpoints are determined dynamically based on cloud provider
+    oauth2_authorization_request_uri = ""  # Set dynamically
+    oauth2_token_request_uri = ""  # Set dynamically
+

Review Comment:
   **Suggestion:** The class declares an empty default token endpoint while 
relying on base refresh-token logic, which uses `config["token_request_uri"]` 
directly. With the documented auto-detection setup (no explicit token URI), 
refresh requests will call an empty URL and fail at runtime. Ensure a concrete 
token endpoint is resolved for refresh flows too, not only initial code 
exchange. [incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Databricks OAuth2 access tokens cannot be refreshed.
   - ⚠️ Users lose connectivity once initial tokens expire.
   - ⚠️ Admins cannot rely on documented optional token URI default.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a Databricks OAuth2 client only with `id` and `secret` in
   `DATABASE_OAUTH2_CLIENTS` (leaving `"token_request_uri"` unset), so
   `BaseEngineSpec.get_oauth2_config` at 
`superset/db_engine_specs/base.py:725-175` builds a
   config where `config["token_request_uri"]` falls back to 
`cls.oauth2_token_request_uri`.
   
   2. For `DatabricksNativeEngineSpec` or `DatabricksPythonConnectorEngineSpec`,
   `oauth2_token_request_uri` is explicitly set to an empty string at
   `superset/db_engine_specs/databricks.py:555-556` and `432-433`, so
   `config["token_request_uri"]` becomes `""` when no explicit token URI is 
provided.
   
   3. After an initial token has been obtained (despite other issues), when the 
access token
   expires, `superset/utils/oauth2.get_oauth2_access_token` at
   `superset/utils/oauth2.py:9-45` calls `refresh_oauth2_token`, which in turn 
calls
   `db_engine_spec.get_oauth2_fresh_token(config, token.refresh_token)` using 
the same
   `config` with `token_request_uri=""`.
   
   4. Since Databricks specs do not override `get_oauth2_fresh_token`,
   `BaseEngineSpec.get_oauth2_fresh_token` at 
`superset/db_engine_specs/base.py:244-260` uses
   `config["token_request_uri"]` as `uri` in `requests.post(uri, data=req_body,
   timeout=timeout)`; with `uri=""`, the HTTP client raises an invalid-URL 
error and token
   refresh fails, breaking Databricks OAuth2 connections once tokens expire 
when admins rely
   on engine defaults for the token URI.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=efd690a58c0e4718916ac9171ac03293&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=efd690a58c0e4718916ac9171ac03293&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 555:556
   **Comment:**
        *Incomplete Implementation: The class declares an empty default token 
endpoint while relying on base refresh-token logic, which uses 
`config["token_request_uri"]` directly. With the documented auto-detection 
setup (no explicit token URI), refresh requests will call an empty URL and fail 
at runtime. Ensure a concrete token endpoint is resolved for refresh flows too, 
not only initial code exchange.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=6e51367743b9da081b295f374adb88a5c2ec4aae3d19248f0533244df4b1994b&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=6e51367743b9da081b295f374adb88a5c2ec4aae3d19248f0533244df4b1994b&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +545,72 @@ class 
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
     supports_dynamic_catalog = True
     supports_cross_catalog_queries = True
 
+    # OAuth 2.0 support
+    supports_oauth2 = True
+    oauth2_exception = OAuth2RedirectError
+    oauth2_scope = "sql"
+
+    # OAuth2 endpoints are determined dynamically based on cloud provider
+    oauth2_authorization_request_uri = ""  # Set dynamically
+    oauth2_token_request_uri = ""  # Set dynamically
+
+    @classmethod
+    def get_oauth2_authorization_uri(
+        cls,
+        config: "OAuth2ClientConfig",
+        state: "OAuth2State",
+        code_verifier: str | None = None,
+    ) -> str:
+        """
+        Return URI for initial OAuth2 request with dynamic endpoint detection.
+        """
+        from superset import db
+        from superset.models.core import Database
+
+        # Get the database to detect cloud provider
+        database_id = state["database_id"]
+        if database := db.session.get(Database, database_id):
+            provider = cls._detect_cloud_provider(database)
+            # Update config with the correct authorization URI for the cloud 
provider
+            from typing import cast
+
+            config = cast(
+                "OAuth2ClientConfig",
+                dict(config)
+                | {
+                    "authorization_request_uri": 
cls._oauth2_endpoints[provider][
+                        "authorization_request_uri"
+                    ]
+                },
+            )
+
+        return super().get_oauth2_authorization_uri(config, state, 
code_verifier)
+
+    @classmethod
+    def get_oauth2_token(
+        cls,
+        config: "OAuth2ClientConfig",
+        code: str,
+        code_verifier: str | None = None,
+    ) -> "OAuth2TokenResponse":
+        """
+        Exchange authorization code for refresh/access tokens with dynamic 
endpoint.
+
+        Note: For token exchange, we need the database context from the state.
+        This is a limitation of the current OAuth2 flow design.
+        """
+        # For now, fall back to AWS endpoints for token exchange
+        # TODO: Improve OAuth2 flow to pass database context to token exchange
+        from typing import cast
+
+        config = cast(
+            "OAuth2ClientConfig",
+            dict(config)
+            | {"token_request_uri": 
cls._oauth2_endpoints["aws"]["token_request_uri"]},
+        )
+

Review Comment:
   **Suggestion:** Token exchange is hardcoded to AWS endpoints for all 
Databricks providers. For Azure and GCP connections this sends authorization 
codes to the wrong token endpoint, which will fail token exchange and break 
OAuth login. Use the same provider-specific endpoint resolution used for 
authorization URI generation. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Azure Databricks OAuth2 login fails at token exchange.
   - ❌ GCP Databricks OAuth2 login fails at token exchange.
   - ⚠️ Multi-cloud Databricks OAuth2 support is effectively unusable.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a Databricks database whose hostname indicates Azure or GCP 
(e.g.
   `adb-...azuredatabricks.net` or `...gcp.databricks.com`) so that 
`_detect_cloud_provider`
   in `superset/db_engine_specs/databricks.py:303-326` returns `"azure"` or 
`"gcp"`, and set
   up a valid OAuth2 client so `Database.get_oauth2_config()` succeeds.
   
   2. Complete the OAuth2 authorization step so the provider (Azure or GCP) 
redirects back to
   Superset with an authorization `code`; `DatabaseOAuth2Command.run` in
   `superset/commands/database/oauth2.py:40-39` then retrieves `oauth2_config` 
via
   `self._database.get_oauth2_config()` and calls
   `self._database.db_engine_spec.get_oauth2_token(oauth2_config, code, 
code_verifier)`.
   
   3. In `DatabricksNativeEngineSpec.get_oauth2_token` and
   `DatabricksPythonConnectorEngineSpec.get_oauth2_token`
   (`superset/db_engine_specs/databricks.py:191-213` and `468-490`), the code 
constructs a
   new config via `dict(config) | {"token_request_uri":
   cls._oauth2_endpoints["aws"]["token_request_uri"]}`, forcing the token 
exchange to always
   target the AWS Databricks token endpoint defined in
   `_oauth2_endpoints["aws"]["token_request_uri"]` regardless of the actual 
cloud provider.
   
   4. `BaseEngineSpec.get_oauth2_token` at 
`superset/db_engine_specs/base.py:209-241` then
   POSTs the Azure- or GCP-issued authorization code to this AWS endpoint, 
which will reject
   the foreign code, causing an HTTP error and preventing 
`DatabaseOAuth2Command.run` from
   storing any tokens for non-AWS Databricks deployments.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=5cba532de75643aab07144fd79864057&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=5cba532de75643aab07144fd79864057&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 607:611
   **Comment:**
        *Api Mismatch: Token exchange is hardcoded to AWS endpoints for all 
Databricks providers. For Azure and GCP connections this sends authorization 
codes to the wrong token endpoint, which will fail token exchange and break 
OAuth login. Use the same provider-specific endpoint resolution used for 
authorization URI generation.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=704f5eeaedf296be978f15474d80189c481a4f15f804a32231fa2a31c6a5e5b9&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F34619&comment_hash=704f5eeaedf296be978f15474d80189c481a4f15f804a32231fa2a31c6a5e5b9&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to