codeant-ai-for-open-source[bot] commented on code in PR #41421:
URL: https://github.com/apache/superset/pull/41421#discussion_r3479338389


##########
superset/db_engine_specs/databricks.py:
##########
@@ -277,6 +283,102 @@ class 
DatabricksDynamicBaseEngineSpec(BasicParametersMixin, DatabricksBaseEngine
         "port": "port",
     }
 
+    # OAuth2 endpoints for different cloud providers
+    _oauth2_endpoints = {
+        "aws": {
+            "authorization_request_uri": 
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/authorize";,
+            "token_request_uri": 
"https://accounts.cloud.databricks.com/oidc/accounts/{}/v1/token";,
+        },
+        "azure": {
+            "authorization_request_uri": 
"https://login.microsoftonline.com/{}/oauth2/v2.0/authorize";,
+            "token_request_uri": 
"https://login.microsoftonline.com/{}/oauth2/v2.0/token";,
+        },
+        "gcp": {
+            "authorization_request_uri": 
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/authorize";,
+            "token_request_uri": 
"https://accounts.gcp.databricks.com/oidc/accounts/{}/v1/token";,
+        },
+    }
+
+    @classmethod
+    def _detect_cloud_provider(cls, database: Database) -> str:
+        """
+        Detect the cloud provider based on the database configuration.
+
+        Returns:
+            str: The cloud provider ('aws', 'azure', or 'gcp')
+        """
+        # Check if cloud provider is explicitly configured in extra
+        if "cloud_provider" in (extra := cls.get_extra_params(database)):
+            provider = extra["cloud_provider"].lower()
+            if provider in cls._oauth2_endpoints:
+                return provider
+

Review Comment:
   **Suggestion:** `cloud_provider` is assumed to be a string and `.lower()` is 
called unconditionally. If a non-string value is present in engine parameters 
(for example a number or boolean from malformed JSON), this raises 
`AttributeError` and breaks OAuth2 endpoint resolution instead of gracefully 
falling back to hostname detection. Validate type before calling `.lower()` and 
ignore invalid values. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Databricks OAuth2 redirect crashes on malformed cloud_provider.
   - ⚠️ Users see server error instead of OAuth login screen.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a Databricks database with OAuth2 enabled so that
   `Database.is_oauth2_enabled()` returns True (see 
`superset/models/core.py:1318-1335`) and
   `DATABASE_OAUTH2_CLIENTS` contains an entry for the Databricks engine (used 
by
   `BaseEngineSpec.get_oauth2_config` at 
`superset/db_engine_specs/base.py:85-115`).
   
   2. In the Databricks database's encrypted extra JSON (loaded in
   `Database.get_oauth2_config` at `superset/models/core.py:25-35` and
   `DatabricksDynamicBaseEngineSpec.get_extra_params` at
   `superset/db_engine_specs/databricks.py:133-153`), set `"cloud_provider": 
true` (a JSON
   boolean) instead of a string like `"aws"`/`"azure"`/`"gcp"`.
   
   3. Trigger an OAuth2 flow by executing a query or testing the connection so 
that
   `check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` catches an 
auth error and
   calls `database.db_engine_spec.start_oauth2_dance(database)`, which 
delegates to
   `BaseEngineSpec.start_oauth2_dance` at 
`superset/db_engine_specs/base.py:11-83`.
   
   4. In `start_oauth2_dance`, `state["database_id"]` is set and 
`oauth2_config` is built,
   then `DatabricksNativeEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/databricks.py:589-622` calls
   `cls._detect_cloud_provider(database)` at 
`superset/db_engine_specs/databricks.py:303-77`;
   inside `_detect_cloud_provider`, the line `provider = 
extra["cloud_provider"].lower()` (PR
   hunk lines 312-315) attempts `.lower()` on a non-string (bool), raising 
`AttributeError`
   and causing the OAuth2 redirect step to fail with a server error instead of 
falling back
   to hostname-based detection.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=0b13a1764ce94fef9c80ba86d06df89d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=0b13a1764ce94fef9c80ba86d06df89d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 312:315
   **Comment:**
        *Type Error: `cloud_provider` is assumed to be a string and `.lower()` 
is called unconditionally. If a non-string value is present in engine 
parameters (for example a number or boolean from malformed JSON), this raises 
`AttributeError` and breaks OAuth2 endpoint resolution instead of gracefully 
falling back to hostname detection. Validate type before calling `.lower()` and 
ignore invalid values.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=ea24161e643cb9ef4097fb6065da8a4adce0b2eec818e352967bcac97a0e84de&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=ea24161e643cb9ef4097fb6065da8a4adce0b2eec818e352967bcac97a0e84de&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -474,6 +576,74 @@ class 
DatabricksNativeEngineSpec(DatabricksDynamicBaseEngineSpec):
     supports_dynamic_catalog = True
     supports_cross_catalog_queries = True
 
+    # OAuth 2.0 support
+    supports_oauth2 = True
+    oauth2_exception = OAuth2RedirectError
+    oauth2_scope = "sql"
+
+    # OAuth2 endpoints are determined dynamically based on cloud provider
+    oauth2_authorization_request_uri = ""  # Set dynamically
+    oauth2_token_request_uri = ""  # Set dynamically
+
+    @classmethod
+    def get_oauth2_authorization_uri(
+        cls,
+        config: "OAuth2ClientConfig",
+        state: "OAuth2State",
+        code_verifier: str | None = None,
+    ) -> str:
+        """
+        Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+        A fully-resolved `authorization_request_uri` from 
`DATABASE_OAUTH2_CLIENTS`
+        is preserved; only fall back to the auto-detected, account-resolved
+        endpoint when none is configured.
+        """
+        if not config.get("authorization_request_uri"):
+            from superset import db
+            from superset.models.core import Database
+
+            # Get the database to detect cloud provider
+            database_id = state["database_id"]
+            if database := db.session.get(Database, database_id):
+                provider = cls._detect_cloud_provider(database)
+                from typing import cast
+
+                config = cast(
+                    "OAuth2ClientConfig",
+                    dict(config)
+                    | {
+                        "authorization_request_uri": 
cls._resolve_oauth2_endpoint(
+                            database, provider, "authorization_request_uri"
+                        )
+                    },
+                )
+
+        return super().get_oauth2_authorization_uri(config, state, 
code_verifier)

Review Comment:
   **Suggestion:** When no `authorization_request_uri` is configured and the 
database lookup by `state["database_id"]` fails, the method silently falls 
through and calls the base implementation with an empty authorization URI. That 
generates an invalid relative redirect URL instead of a provider URL, breaking 
the OAuth2 flow. Add an explicit error when the database cannot be loaded for 
auto-resolution. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Legacy Databricks OAuth2 may redirect to malformed URL.
   - ⚠️ Users cannot complete OAuth login when state is stale.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure OAuth2 for the legacy Databricks native engine (engine name 
configured for
   `DatabricksNativeEngineSpec` in `DATABASE_OAUTH2_CLIENTS`) without specifying
   `authorization_request_uri`, so `BaseEngineSpec.get_oauth2_config` at
   `superset/db_engine_specs/base.py:85-115` builds a config where
   `"authorization_request_uri"` is taken from
   `DatabricksNativeEngineSpec.oauth2_authorization_request_uri`, which is set 
to the empty
   string at `superset/db_engine_specs/databricks.py:20-27`.
   
   2. Start an OAuth2 flow by running a query that hits OAuth-protected 
Databricks, causing
   `check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` to call
   `database.db_engine_spec.start_oauth2_dance(database)`, which delegates to
   `BaseEngineSpec.start_oauth2_dance` at 
`superset/db_engine_specs/base.py:11-83` and
   constructs an OAuth2 `state` dict that includes `"database_id": database.id`.
   
   3. In a test or misconfigured environment, invoke
   `DatabricksNativeEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/databricks.py:589-622` with an 
`OAuth2ClientConfig` whose
   `"authorization_request_uri"` is empty (as built in step 1) and a `state` 
whose
   `"database_id"` does not resolve via `db.session.get(Database, database_id)` 
(for example,
   by passing a stale or non-existent ID), so the `if database := 
db.session.get(...)` block
   at lines 607-620 is skipped and `config["authorization_request_uri"]` 
remains the empty
   string.
   
   4. The method then falls through to 
`super().get_oauth2_authorization_uri(config, state,
   code_verifier)`, which in `BaseEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/base.py:117-146` reads `uri =
   config["authorization_request_uri"]` (empty) and returns `urljoin(uri, "?" +
   urlencode(params))`, yielding a relative URL starting with `?` instead of a 
Databricks
   provider URL; this malformed URL is then wrapped in an `OAuth2RedirectError` 
by
   `start_oauth2_dance`, leading the frontend to redirect to an invalid OAuth2 
authorization
   URL instead of failing fast with a clear error.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c0238da940434b008761151fae14151d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c0238da940434b008761151fae14151d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 602:622
   **Comment:**
        *Logic Error: When no `authorization_request_uri` is configured and the 
database lookup by `state["database_id"]` fails, the method silently falls 
through and calls the base implementation with an empty authorization URI. That 
generates an invalid relative redirect URL instead of a provider URL, breaking 
the OAuth2 flow. Add an explicit error when the database cannot be loaded for 
auto-resolution.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2be0cf4e8a033232c7c5b727ba1e3b5109f3ef52fe97812f7fc15f15f90695ac&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2be0cf4e8a033232c7c5b727ba1e3b5109f3ef52fe97812f7fc15f15f90695ac&reaction=dislike'>👎</a>



##########
superset/db_engine_specs/databricks.py:
##########
@@ -685,6 +855,74 @@ class 
DatabricksPythonConnectorEngineSpec(DatabricksDynamicBaseEngineSpec):
 
     supports_dynamic_schema = supports_catalog = supports_dynamic_catalog = 
True
 
+    # OAuth 2.0 support
+    supports_oauth2 = True
+    oauth2_exception = OAuth2RedirectError
+    oauth2_scope = "sql"
+
+    # OAuth2 endpoints are determined dynamically based on cloud provider
+    oauth2_authorization_request_uri = ""  # Set dynamically
+    oauth2_token_request_uri = ""  # Set dynamically
+
+    @classmethod
+    def get_oauth2_authorization_uri(
+        cls,
+        config: "OAuth2ClientConfig",
+        state: "OAuth2State",
+        code_verifier: str | None = None,
+    ) -> str:
+        """
+        Return URI for initial OAuth2 request with dynamic endpoint detection.
+
+        A fully-resolved `authorization_request_uri` from 
`DATABASE_OAUTH2_CLIENTS`
+        is preserved; only fall back to the auto-detected, account-resolved
+        endpoint when none is configured.
+        """
+        if not config.get("authorization_request_uri"):
+            from superset import db
+            from superset.models.core import Database
+
+            # Get the database to detect cloud provider
+            database_id = state["database_id"]
+            if database := db.session.get(Database, database_id):
+                provider = cls._detect_cloud_provider(database)
+                from typing import cast
+
+                config = cast(
+                    "OAuth2ClientConfig",
+                    dict(config)
+                    | {
+                        "authorization_request_uri": 
cls._resolve_oauth2_endpoint(
+                            database, provider, "authorization_request_uri"
+                        )
+                    },
+                )
+
+        return super().get_oauth2_authorization_uri(config, state, 
code_verifier)

Review Comment:
   **Suggestion:** The Python connector OAuth2 authorization path has the same 
silent fallthrough: if database lookup fails during endpoint auto-detection, it 
calls the base method with an empty URI and builds an invalid redirect target. 
This should fail fast with a clear OAuth2 error instead of producing a 
malformed authorization URL. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Databricks Python connector OAuth2 redirect can be malformed.
   - ⚠️ Users may see invalid redirect instead of clear OAuth2 error.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure OAuth2 for the primary Databricks Python connector engine
   (`DatabricksPythonConnectorEngineSpec`) in `DATABASE_OAUTH2_CLIENTS` without 
an
   `authorization_request_uri`, so `BaseEngineSpec.get_oauth2_config` at
   `superset/db_engine_specs/base.py:85-115` builds a config where
   `"authorization_request_uri"` is taken from
   `DatabricksPythonConnectorEngineSpec.oauth2_authorization_request_uri`, 
which is set to
   the empty string at `superset/db_engine_specs/databricks.py:299-306`.
   
   2. Trigger OAuth2 for a Databricks connection using the Python connector so 
that
   `check_for_oauth2(database)` at `superset/utils/oauth2.py:11-20` calls
   `database.db_engine_spec.start_oauth2_dance(database)`, which in turn invokes
   `BaseEngineSpec.start_oauth2_dance` at 
`superset/db_engine_specs/base.py:11-83` and
   constructs `state["database_id"]` from `database.id`.
   
   3. In a test or misconfigured scenario, call
   `DatabricksPythonConnectorEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/databricks.py:308-343` (PR hunk lines 881-901) 
with a config
   whose `"authorization_request_uri"` is empty and a `state` whose 
`"database_id"` does not
   resolve via `db.session.get(Database, database_id)` (e.g., non-existent or 
stale), causing
   the `if database := db.session.get(...)` block at lines 886-900 to be 
skipped and leaving
   `config["authorization_request_uri"]` as an empty string.
   
   4. The method then returns `super().get_oauth2_authorization_uri(config, 
state,
   code_verifier)`, which in `BaseEngineSpec.get_oauth2_authorization_uri` at
   `superset/db_engine_specs/base.py:117-146` uses the empty 
`authorization_request_uri`,
   producing a relative URL like `?scope=...` instead of a Databricks provider 
URL; when
   wrapped in an `OAuth2RedirectError` this yields a broken redirect target for 
Databricks
   OAuth2 via the Python connector instead of a clear `OAuth2Error` explaining 
that the
   database context could not be resolved.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=b301feb140b144dfb1c370430e166cc2&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=b301feb140b144dfb1c370430e166cc2&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/databricks.py
   **Line:** 881:901
   **Comment:**
        *Logic Error: The Python connector OAuth2 authorization path has the 
same silent fallthrough: if database lookup fails during endpoint 
auto-detection, it calls the base method with an empty URI and builds an 
invalid redirect target. This should fail fast with a clear OAuth2 error 
instead of producing a malformed authorization URL.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2af0e80ac6c86457708a5155e7d509ced3e67de216fe959160b38b85a7318167&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41421&comment_hash=2af0e80ac6c86457708a5155e7d509ced3e67de216fe959160b38b85a7318167&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to