codeant-ai-for-open-source[bot] commented on code in PR #41421:
URL: https://github.com/apache/superset/pull/41421#discussion_r3479338389
##########
superset/db_engine_specs/databricks.py:
##########
@@ -277,6 +283,102 @@ class
DatabricksDynamicBaseEngineSpec(BasicParametersMixin, DatabricksBaseEngine
"port": "port",
}
+ # OAuth2 endpoints for different cloud providers
+ _oauth2_endpoints = {
+ "aws": {
+ "authorization_request_uri":
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/authorize",
+ "token_request_uri":
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/token",
+ },
+ "azure": {
+ "authorization_request_uri":
"https://login.microsoftonline.com/{}/oauth2/v2.0/authorize",
+ "token_request_uri":
"https://login.microsoftonline.com/{}/oauth2/v2.0/token",
+ },
+ "gcp": {
+ "authorization_request_uri":
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/authorize",
+ "token_request_uri":
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/token",
+ },
+ }
+
+ @classmethod
+ def _detect_cloud_provider(cls, database: Database) -> str:
+ """
+ Detect the cloud provider based on the database configuration.
+
+ Returns:
+ str: The cloud provider ('aws', 'azure', or 'gcp')
+ """
+ # Check if cloud provider is explicitly configured in extra
+ if "cloud_provider" in (extra := cls.get_extra_params(database)):
+ provider = extra["cloud_provider"].lower()
+ if provider in cls._oauth2_endpoints:
+ return provider
+
Review Comment:
**Suggestion:** `cloud_provider` is assumed to be a string and `.lower()` is
called unconditionally. If a non-string value is present in engine parameters
(for example a number or boolean from malformed JSON), this raises
`AttributeError` and breaks OAuth2 endpoint resolution instead of gracefully
falling back to hostname detection. Validate type before calling `.lower()` and
ignore invalid values. [type error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Databricks OAuth2 redirect crashes on malformed cloud_provider.
- ⚠️ Users see server error instead of OAuth login screen.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a Databricks database with OAuth2 enabled so that
`Database.is_oauth2_enabled()` returns True (see
`superset/models/core.py:1318-1335`) and
`DATABASE_OAUTH2_CLIENTS` contains an entry for the Databricks engine (used
by
`BaseEngineSpec.get_oauth2_config` at
`superset/db_engine_specs/base.py:85-115`).
2. In the Databricks database's encrypted extra JSON (loaded in
`Database.get_oauth2_config` at `superset/models/core.py:25-35` and
`DatabricksDynamicBaseEngineSpec.get_extra_params` at
`superset/db_engine_specs/databricks.py:133-153`), set `"cloud_provider":
true` (a JSON
boolean) instead of a string like `"aws"`/`"azure"`/`"gcp"`.
3. Trigger an OAuth2 flow by executing a query or testing the connection so
that
`check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` catches an
auth error and
calls `database.db_engine_spec.start_oauth2_dance(database)`, which
delegates to
`BaseEngineSpec.start_oauth2_dance` at
`superset/db_engine_specs/base.py:11-83`.
4. In `start_oauth2_dance`, `state["database_id"]` is set and
`oauth2_config` is built,
then `DatabricksNativeEngineSpec.get_oauth2_authorization_uri` at
`superset/db_engine_specs/databricks.py:589-622` calls
`cls._detect_cloud_provider(database)` at
`superset/db_engine_specs/databricks.py:303-77`;
inside `_detect_cloud_provider`, the line `provider =
extra["cloud_provider"].lower()` (PR
hunk lines 312-315) attempts `.lower()` on a non-string (bool), raising
`AttributeError`
and causing the OAuth2 redirect step to fail with a server error instead of
falling back
to hostname-based detection.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=0b13a1764ce94fef9c80ba86d06df89d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=0b13a1764ce94fef9c80ba86d06df89d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 312:315
**Comment:**
*Type Error: `cloud_provider` is assumed to be a string and `.lower()`
is called unconditionally. If a non-string value is present in engine
parameters (for example a number or boolean from malformed JSON), this raises
`AttributeError` and breaks OAuth2 endpoint resolution instead of gracefully
falling back to hostname detection. Validate type before calling `.lower()` and
ignore invalid values.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=ea24161e643cb9ef4097fb6065da8a4adce0b2eec818e352967bcac97a0e84de&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=ea24161e643cb9ef4097fb6065da8a4adce0b2eec818e352967bcac97a0e84de&reaction=dislike'>👎</a>
##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +576,74 @@ class
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
supports_dynamic_catalog = True
supports_cross_catalog_queries = True
+ # OAuth 2.0 support
+ supports_oauth2 = True
+ oauth2_exception = OAuth2RedirectError
+ oauth2_scope = "sql"
+
+ # OAuth2 endpoints are determined dynamically based on cloud provider
+ oauth2_authorization_request_uri = "" # Set dynamically
+ oauth2_token_request_uri = "" # Set dynamically
+
+ @classmethod
+ def get_oauth2_authorization_uri(
+ cls,
+ config: "OAuth2ClientConfig",
+ state: "OAuth2State",
+ code_verifier: str | None = None,
+ ) -> str:
+ """
+ Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+ A fully-resolved `authorization_request_uri` from
`DATABASE_OAUTH2_CLIENTS`
+ is preserved; only fall back to the auto-detected, account-resolved
+ endpoint when none is configured.
+ """
+ if not config.get("authorization_request_uri"):
+ from superset import db
+ from superset.models.core import Database
+
+ # Get the database to detect cloud provider
+ database_id = state["database_id"]
+ if database := db.session.get(Database, database_id):
+ provider = cls._detect_cloud_provider(database)
+ from typing import cast
+
+ config = cast(
+ "OAuth2ClientConfig",
+ dict(config)
+ | {
+ "authorization_request_uri":
cls._resolve_oauth2_endpoint(
+ database, provider, "authorization_request_uri"
+ )
+ },
+ )
+
+ return super().get_oauth2_authorization_uri(config, state,
code_verifier)
Review Comment:
**Suggestion:** When no `authorization_request_uri` is configured and the
database lookup by `state["database_id"]` fails, the method silently falls
through and calls the base implementation with an empty authorization URI. That
generates an invalid relative redirect URL instead of a provider URL, breaking
the OAuth2 flow. Add an explicit error when the database cannot be loaded for
auto-resolution. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Legacy Databricks OAuth2 may redirect to malformed URL.
- ⚠️ Users cannot complete OAuth login when state is stale.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure OAuth2 for the legacy Databricks native engine (engine name
configured for
`DatabricksNativeEngineSpec` in `DATABASE_OAUTH2_CLIENTS`) without specifying
`authorization_request_uri`, so `BaseEngineSpec.get_oauth2_config` at
`superset/db_engine_specs/base.py:85-115` builds a config where
`"authorization_request_uri"` is taken from
`DatabricksNativeEngineSpec.oauth2_authorization_request_uri`, which is set
to the empty
string at `superset/db_engine_specs/databricks.py:20-27`.
2. Start an OAuth2 flow by running a query that hits OAuth-protected
Databricks, causing
`check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` to call
`database.db_engine_spec.start_oauth2_dance(database)`, which delegates to
`BaseEngineSpec.start_oauth2_dance` at
`superset/db_engine_specs/base.py:11-83` and
constructs an OAuth2 `state` dict that includes `"database_id": database.id`.
3. In a test or misconfigured environment, invoke
`DatabricksNativeEngineSpec.get_oauth2_authorization_uri` at
`superset/db_engine_specs/databricks.py:589-622` with an
`OAuth2ClientConfig` whose
`"authorization_request_uri"` is empty (as built in step 1) and a `state`
whose
`"database_id"` does not resolve via `db.session.get(Database, database_id)`
(for example,
by passing a stale or non-existent ID), so the `if database :=
db.session.get(...)` block
at lines 607-620 is skipped and `config["authorization_request_uri"]`
remains the empty
string.
4. The method then falls through to
`super().get_oauth2_authorization_uri(config, state,
code_verifier)`, which in `BaseEngineSpec.get_oauth2_authorization_uri` at
`superset/db_engine_specs/base.py:117-146` reads `uri =
config["authorization_request_uri"]` (empty) and returns `urljoin(uri, "?" +
urlencode(params))`, yielding a relative URL starting with `?` instead of a
Databricks
provider URL; this malformed URL is then wrapped in an `OAuth2RedirectError`
by
`start_oauth2_dance`, leading the frontend to redirect to an invalid OAuth2
authorization
URL instead of failing fast with a clear error.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c0238da940434b008761151fae14151d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c0238da940434b008761151fae14151d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 602:622
**Comment:**
*Logic Error: When no `authorization_request_uri` is configured and the
database lookup by `state["database_id"]` fails, the method silently falls
through and calls the base implementation with an empty authorization URI. That
generates an invalid relative redirect URL instead of a provider URL, breaking
the OAuth2 flow. Add an explicit error when the database cannot be loaded for
auto-resolution.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2be0cf4e8a033232c7c5b727ba1e3b5109f3ef52fe97812f7fc15f15f90695ac&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2be0cf4e8a033232c7c5b727ba1e3b5109f3ef52fe97812f7fc15f15f90695ac&reaction=dislike'>👎</a>
##########
superset/db_engine_specs/databricks.py:
##########
@@ -685,6 +855,74 @@ class
DatabricksPythonConnectorEngineSpec(DatabricksDynamicBaseEngineSpec):
supports_dynamic_schema = supports_catalog = supports_dynamic_catalog =
True
+ # OAuth 2.0 support
+ supports_oauth2 = True
+ oauth2_exception = OAuth2RedirectError
+ oauth2_scope = "sql"
+
+ # OAuth2 endpoints are determined dynamically based on cloud provider
+ oauth2_authorization_request_uri = "" # Set dynamically
+ oauth2_token_request_uri = "" # Set dynamically
+
+ @classmethod
+ def get_oauth2_authorization_uri(
+ cls,
+ config: "OAuth2ClientConfig",
+ state: "OAuth2State",
+ code_verifier: str | None = None,
+ ) -> str:
+ """
+ Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+ A fully-resolved `authorization_request_uri` from
`DATABASE_OAUTH2_CLIENTS`
+ is preserved; only fall back to the auto-detected, account-resolved
+ endpoint when none is configured.
+ """
+ if not config.get("authorization_request_uri"):
+ from superset import db
+ from superset.models.core import Database
+
+ # Get the database to detect cloud provider
+ database_id = state["database_id"]
+ if database := db.session.get(Database, database_id):
+ provider = cls._detect_cloud_provider(database)
+ from typing import cast
+
+ config = cast(
+ "OAuth2ClientConfig",
+ dict(config)
+ | {
+ "authorization_request_uri":
cls._resolve_oauth2_endpoint(
+ database, provider, "authorization_request_uri"
+ )
+ },
+ )
+
+ return super().get_oauth2_authorization_uri(config, state,
code_verifier)
Review Comment:
**Suggestion:** The Python connector OAuth2 authorization path has the same
silent fallthrough: if database lookup fails during endpoint auto-detection, it
calls the base method with an empty URI and builds an invalid redirect target.
This should fail fast with a clear OAuth2 error instead of producing a
malformed authorization URL. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Databricks Python connector OAuth2 redirect can be malformed.
- ⚠️ Users may see invalid redirect instead of clear OAuth2 error.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure OAuth2 for the primary Databricks Python connector engine
(`DatabricksPythonConnectorEngineSpec`) in `DATABASE_OAUTH2_CLIENTS` without
an
`authorization_request_uri`, so `BaseEngineSpec.get_oauth2_config` at
`superset/db_engine_specs/base.py:85-115` builds a config where
`"authorization_request_uri"` is taken from
`DatabricksPythonConnectorEngineSpec.oauth2_authorization_request_uri`,
which is set to
the empty string at `superset/db_engine_specs/databricks.py:299-306`.
2. Trigger OAuth2 for a Databricks connection using the Python connector so
that
`check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` calls
`database.db_engine_spec.start_oauth2_dance(database)`, which in turn invokes
`BaseEngineSpec.start_oauth2_dance` at
`superset/db_engine_specs/base.py:11-83` and
constructs `state["database_id"]` from `database.id`.
3. In a test or misconfigured scenario, call
`DatabricksPythonConnectorEngineSpec.get_oauth2_authorization_uri` at
`superset/db_engine_specs/databricks.py:308-343` (PR hunk lines 881-901)
with a config
whose `"authorization_request_uri"` is empty and a `state` whose
`"database_id"` does not
resolve via `db.session.get(Database, database_id)` (e.g., non-existent or
stale), causing
the `if database := db.session.get(...)` block at lines 886-900 to be
skipped and leaving
`config["authorization_request_uri"]` as an empty string.
4. The method then returns `super().get_oauth2_authorization_uri(config,
state,
code_verifier)`, which in `BaseEngineSpec.get_oauth2_authorization_uri` at
`superset/db_engine_specs/base.py:117-146` uses the empty
`authorization_request_uri`,
producing a relative URL like `?scope=...` instead of a Databricks provider
URL; when
wrapped in an `OAuth2RedirectError` this yields a broken redirect target for
Databricks
OAuth2 via the Python connector instead of a clear `OAuth2Error` explaining
that the
database context could not be resolved.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=b301feb140b144dfb1c370430e166cc2&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=b301feb140b144dfb1c370430e166cc2&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/db_engine_specs/databricks.py
**Line:** 881:901
**Comment:**
*Logic Error: The Python connector OAuth2 authorization path has the
same silent fallthrough: if database lookup fails during endpoint
auto-detection, it calls the base method with an empty URI and builds an
invalid redirect target. This should fail fast with a clear OAuth2 error
instead of producing a malformed authorization URL.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2af0e80ac6c86457708a5155e7d509ced3e67de216fe959160b38b85a7318167&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2af0e80ac6c86457708a5155e7d509ced3e67de216fe959160b38b85a7318167&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]