codeant-ai-for-open-source[bot] commented on code in PR #41421:
URL: https://github.com/apache/superset/pull/41421#discussion_r3476352263
##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +576,74 @@ class
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
supports_dynamic_catalog = True
supports_cross_catalog_queries = True
+ # OAuth 2.0 support
+ supports_oauth2 = True
+ oauth2_exception = OAuth2RedirectError
+ oauth2_scope = "sql"
+
+ # OAuth2 endpoints are determined dynamically based on cloud provider
+ oauth2_authorization_request_uri = "" # Set dynamically
+ oauth2_token_request_uri = "" # Set dynamically
+
+ @classmethod
+ def get_oauth2_authorization_uri(
+ cls,
+ config: "OAuth2ClientConfig",
+ state: "OAuth2State",
+ code_verifier: str | None = None,
+ ) -> str:
+ """
+ Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+ A fully-resolved `authorization_request_uri` from
`DATABASE_OAUTH2_CLIENTS`
+ is preserved; only fall back to the auto-detected, account-resolved
+ endpoint when none is configured.
+ """
+ if not config.get("authorization_request_uri"):
+ from superset import db
+ from superset.models.core import Database
+
+ # Get the database to detect cloud provider
+ database_id = state["database_id"]
+ if database := db.session.get(Database, database_id):
+ provider = cls._detect_cloud_provider(database)
+ from typing import cast
+
+ config = cast(
+ "OAuth2ClientConfig",
+ dict(config)
+ | {
+ "authorization_request_uri":
cls._resolve_oauth2_endpoint(
+ database, provider, "authorization_request_uri"
+ )
+ },
+ )
+
+ return super().get_oauth2_authorization_uri(config, state,
code_verifier)
+
+ @classmethod
+ def get_oauth2_token(
+ cls,
+ config: "OAuth2ClientConfig",
+ code: str,
+ code_verifier: str | None = None,
+ ) -> "OAuth2TokenResponse":
+ """
+ Exchange authorization code for refresh/access tokens.
+
+ The token request URI is resolved when the OAuth2 config is built (see
+ `get_oauth2_config`) and already targets the correct cloud provider and
+ account. There is no database context here to auto-detect it, so fail
+ fast rather than POST to an unresolved endpoint when it is missing.
+ """
+ if not config.get("token_request_uri"):
+ raise OAuth2Error(
+ "Databricks OAuth2 token endpoint is not configured: provide a
"
+ "fully-resolved `token_request_uri` in
DATABASE_OAUTH2_CLIENTS."
+ )
+
+ return super().get_oauth2_token(config, code, code_verifier)
Review Comment:
**Suggestion:** This fails OAuth2 code exchange for the auto-detected flow:
`DATABASE_OAUTH2_CLIENTS` can omit `token_request_uri`, but this method now
hard-fails when it is missing. During callback, `OAuth2StoreTokenCommand`
reloads config via `database.get_oauth2_config()`, which still has an empty
token URI for Databricks, so token exchange always raises `OAuth2Error` instead
of using the detected provider/account endpoint. Resolve `token_request_uri`
dynamically (from database `cloud_provider` + `account_id`/`tenant_id`) before
exchange, or carry the resolved token endpoint through the OAuth state.
[incomplete implementation]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Databricks (legacy) OAuth2 logins fail with default config.
- ❌ Tokens never stored by `OAuth2StoreTokenCommand.run`.
- ❌ Queries against OAuth-enabled Databricks databases cannot run.
- ⚠️ Behavior contradicts docs promising auto-detected token endpoint.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure global Databricks OAuth2 as documented at
`docs/user_docs_versioned_docs/version-6.0.0/configuration/databases.mdx:5-23`,
defining
`DATABASE_OAUTH2_CLIENTS` entries for `"Databricks (legacy)"` with `id`,
`secret`, and
`scope` but **no** `token_request_uri` (docs explicitly state endpoints are
auto-detected).
2. Create a Databricks database using the `"Databricks (legacy)"` engine so
that its
`db_engine_spec` is `DatabricksNativeEngineSpec`
(`superset/db_engine_specs/databricks.py:546-553`), and set `ENGINE
PARAMETERS` JSON to
include `cloud_provider` and `account_id`/`tenant_id` as described in the
docs
(`databases.mdx:45-63`).
3. Trigger the OAuth2 flow by running a query that requires OAuth for this
database;
`Database.start_oauth2_dance()` (`superset/models/core.py:27-35`) calls
`BaseEngineSpec.start_oauth2_dance()`
(`superset/db_engine_specs/base.py:11-83`), which
obtains `oauth2_config = database.get_oauth2_config()` (`base.py:73-75). For
global config
without `token_request_uri`, `BaseEngineSpec.get_oauth2_config()`
(`base.py:86-115`) sets
`config["token_request_uri"]` to `cls.oauth2_token_request_uri`, which for
`DatabricksNativeEngineSpec` is the empty string
(`superset/db_engine_specs/databricks.py:25-27`).
4. After the user authorizes with Databricks and the provider redirects back
to Superset
at `/api/v1/database/oauth2/` (`superset/databases/api.py:40-88`),
`OAuth2StoreTokenCommand.run()`
(`superset/commands/database/oauth2.py:7-58`) reloads
`oauth2_config = self._database.get_oauth2_config()` (line 13-15, still with
empty
`token_request_uri`) and calls
`self._database.db_engine_spec.get_oauth2_token(...)` (line
35-39). For the legacy engine this resolves to
`DatabricksNativeEngineSpec.get_oauth2_token()`
(`superset/db_engine_specs/databricks.py:65-86`), which executes `if not
config.get("token_request_uri"):` at line 639 and raises `OAuth2Error(...)`
at lines
640-643, aborting the token exchange instead of using any auto-detected
provider/account
endpoint. No tokens are stored and the OAuth2 login flow cannot complete for
the
documented default configuration.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=88f0674e117542b098554f99455905df&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=88f0674e117542b098554f99455905df&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 639:645
**Comment:**
*Incomplete Implementation: This fails OAuth2 code exchange for the
auto-detected flow: `DATABASE_OAUTH2_CLIENTS` can omit `token_request_uri`, but
this method now hard-fails when it is missing. During callback,
`OAuth2StoreTokenCommand` reloads config via `database.get_oauth2_config()`,
which still has an empty token URI for Databricks, so token exchange always
raises `OAuth2Error` instead of using the detected provider/account endpoint.
Resolve `token_request_uri` dynamically (from database `cloud_provider` +
`account_id`/`tenant_id`) before exchange, or carry the resolved token endpoint
through the OAuth state.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=77822a7bbd119e9a581b5274ce031f4a20696399f177886590b529f48f524632&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=77822a7bbd119e9a581b5274ce031f4a20696399f177886590b529f48f524632&reaction=dislike'>👎</a>
##########
superset/db_engine_specs/databricks.py:
##########
@@ -685,6 +855,74 @@ class
DatabricksPythonConnectorEngineSpec(DatabricksDynamicBaseEngineSpec):
supports_dynamic_schema = supports_catalog = supports_dynamic_catalog =
True
+ # OAuth 2.0 support
+ supports_oauth2 = True
+ oauth2_exception = OAuth2RedirectError
+ oauth2_scope = "sql"
+
+ # OAuth2 endpoints are determined dynamically based on cloud provider
+ oauth2_authorization_request_uri = "" # Set dynamically
+ oauth2_token_request_uri = "" # Set dynamically
+
+ @classmethod
+ def get_oauth2_authorization_uri(
+ cls,
+ config: "OAuth2ClientConfig",
+ state: "OAuth2State",
+ code_verifier: str | None = None,
+ ) -> str:
+ """
+ Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+ A fully-resolved `authorization_request_uri` from
`DATABASE_OAUTH2_CLIENTS`
+ is preserved; only fall back to the auto-detected, account-resolved
+ endpoint when none is configured.
+ """
+ if not config.get("authorization_request_uri"):
+ from superset import db
+ from superset.models.core import Database
+
+ # Get the database to detect cloud provider
+ database_id = state["database_id"]
+ if database := db.session.get(Database, database_id):
+ provider = cls._detect_cloud_provider(database)
+ from typing import cast
+
+ config = cast(
+ "OAuth2ClientConfig",
+ dict(config)
+ | {
+ "authorization_request_uri":
cls._resolve_oauth2_endpoint(
+ database, provider, "authorization_request_uri"
+ )
+ },
+ )
+
+ return super().get_oauth2_authorization_uri(config, state,
code_verifier)
+
+ @classmethod
+ def get_oauth2_token(
+ cls,
+ config: "OAuth2ClientConfig",
+ code: str,
+ code_verifier: str | None = None,
+ ) -> "OAuth2TokenResponse":
+ """
+ Exchange authorization code for refresh/access tokens.
+
+ The token request URI is resolved when the OAuth2 config is built (see
+ `get_oauth2_config`) and already targets the correct cloud provider and
+ account. There is no database context here to auto-detect it, so fail
+ fast rather than POST to an unresolved endpoint when it is missing.
+ """
+ if not config.get("token_request_uri"):
+ raise OAuth2Error(
+ "Databricks OAuth2 token endpoint is not configured: provide a
"
+ "fully-resolved `token_request_uri` in
DATABASE_OAUTH2_CLIENTS."
+ )
+
+ return super().get_oauth2_token(config, code, code_verifier)
Review Comment:
**Suggestion:** The same token-exchange break exists in the Python connector
spec: when admins rely on auto-detection (no explicit `token_request_uri`),
callback token exchange always errors out because this method requires a fully
resolved URI but does not resolve one. Implement the same dynamic endpoint
resolution used for authorization (or pass the resolved token URI from the
initial step) so OAuth2 login can complete. [incomplete implementation]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Databricks Python connector OAuth2 flow fails by default.
- ❌ Personal tokens for Databricks cannot be stored for users.
- ❌ OAuth2-based per-user Databricks queries cannot complete.
- ⚠️ Docs promise auto-detected token endpoint, code does not.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure global Databricks OAuth2 as per
`docs/user_docs_versioned_docs/version-6.0.0/configuration/databases.mdx:5-23`,
defining
`DATABASE_OAUTH2_CLIENTS["Databricks"]` with `id`, `secret`, and `scope` but
**no**
`token_request_uri`, relying on the documented auto-detection (docs lines
39-69 state
endpoints are auto-detected unless explicitly overridden).
2. Create a Databricks database using the `"Databricks"` engine so that
`db_engine_spec`
is `DatabricksPythonConnectorEngineSpec`
(`superset/db_engine_specs/databricks.py:758-763`), configure its hostname
and `ENGINE
PARAMETERS` (`cloud_provider` + `account_id`/`tenant_id`) as suggested in
the docs
(`databases.mdx:45-63`).
3. Run a query that requires OAuth2; `Database.start_oauth2_dance()`
(`superset/models/core.py:27-35`) calls `BaseEngineSpec.start_oauth2_dance()`
(`superset/db_engine_specs/base.py:11-83`), which uses
`database.get_oauth2_config()`
(`models/core.py:15-25`). With only global config,
`BaseEngineSpec.get_oauth2_config()`
(`base.py:86-115`) sets `config["token_request_uri"]` from
`db_engine_spec_config.get("token_request_uri",
cls.oauth2_token_request_uri)`. For
`DatabricksPythonConnectorEngineSpec`, `cls.oauth2_token_request_uri` is
defined as an
empty string (`superset/db_engine_specs/databricks.py:25-27` in its OAuth2
section), so
`config["token_request_uri"]` is `""`. The authorization URL succeeds because
`DatabricksPythonConnectorEngineSpec.get_oauth2_authorization_uri()`
(`databricks.py:24-23
in the later hunk) dynamically resolves only the `authorization_request_uri`.
4. After Databricks redirects back to Superset at `/api/v1/database/oauth2/`
(`superset/databases/api.py:40-88`), `OAuth2StoreTokenCommand.run()`
(`superset/commands/database/oauth2.py:7-58`) reloads `oauth2_config =
self._database.get_oauth2_config()` (line 13-15, still with empty
`token_request_uri`) and
then calls `self._database.db_engine_spec.get_oauth2_token(...)` (line
35-39). For the
Python connector this resolves to
`DatabricksPythonConnectorEngineSpec.get_oauth2_token()`
(`superset/db_engine_specs/databricks.py:25-45 in the second hunk), which
performs `if not
config.get("token_request_uri"):` at line 918 and raises `OAuth2Error(...)`
at lines
919-921. Because `_resolve_oauth2_endpoint()` is never used for
`token_request_uri`, the
token endpoint is never auto-detected and the OAuth2 exchange for the Python
connector
always fails under the documented default configuration.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c8551f966e7048b2a0c844b4a5ec00e8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c8551f966e7048b2a0c844b4a5ec00e8&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 918:924
**Comment:**
*Incomplete Implementation: The same token-exchange break exists in the
Python connector spec: when admins rely on auto-detection (no explicit
`token_request_uri`), callback token exchange always errors out because this
method requires a fully resolved URI but does not resolve one. Implement the
same dynamic endpoint resolution used for authorization (or pass the resolved
token URI from the initial step) so OAuth2 login can complete.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=214943266bd026f814ffe42f26db87a987d60b312cc0d88c7236e72ffb622e44&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=214943266bd026f814ffe42f26db87a987d60b312cc0d88c7236e72ffb622e44&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]