ataulmujeeb-cyber opened a new issue, #63285:
URL: https://github.com/apache/airflow/issues/63285

   ### Apache Airflow Provider(s)
   
   databricks
   
   ### Versions of Apache Airflow Providers
   
   - `apache-airflow-providers-databricks==7.9.1` (confirmed), also likely 
affects up to latest
     `7.10.0`
   
   
   ### Apache Airflow version
   
   3.1-11
   
   ### Operating System
   
   ubuntu-latest
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Astro Runtime 3.1-11
   
   ### What happened
   
   The bug is in 
[`DatabricksSqlHook._get_sql_endpoint_by_name()`](https://github.com/apache/airflow/b
     
lob/main/providers/databricks/src/airflow/providers/databricks/hooks/databricks_sql.py#L121-L130):
   
     ```python
     def _get_sql_endpoint_by_name(self, endpoint_name) -> dict[str, Any]:
         result = self._do_api_call(LIST_SQL_ENDPOINTS_ENDPOINT)
         if "endpoints" not in result:                          # <--- BUG HERE
             raise AirflowException("Can't list Databricks SQL endpoints")
         try:
             endpoint = next(endpoint for endpoint in result["endpoints"] if 
endpoint["name"] ==
     endpoint_name)
         except StopIteration:
             raise AirflowException(f"Can't find Databricks SQL endpoint with 
name '{endpoint_name}'")
         else:
             return endpoint
   
     The LIST_SQL_ENDPOINTS_ENDPOINT constant is defined as:
     LIST_SQL_ENDPOINTS_ENDPOINT = ("GET", "2.0/sql/warehouses")
   
     This means the hook calls the new Databricks REST API path GET 
/api/2.0/sql/warehouses. However,
     the response key changed when Databricks renamed "SQL endpoints" to "SQL 
warehouses" (in 2023):
   
     ┌───────────────────────────────────────┬───────────────────┐
     │               API Path                │ Response JSON Key │
     ├───────────────────────────────────────┼───────────────────┤
     │ GET /api/2.0/sql/endpoints (legacy)   │ "endpoints"       │
     ├───────────────────────────────────────┼───────────────────┤
     │ GET /api/2.0/sql/warehouses (current) │ "warehouses"      │
     └───────────────────────────────────────┴───────────────────┘
   
     The code calls the new path (2.0/sql/warehouses) but checks for the old 
response key ("endpoints").
      Since the new API returns "warehouses" as the top-level key, the check if 
"endpoints" not in
     result is always true, and the method always raises 
AirflowException("Can't list Databricks SQL
     endpoints").
   
     Full Traceback
   
     AirflowException: Can't list Databricks SQL endpoints
   
     File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1004
      in run
     File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
 line 1405
      in _execute_task
     File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line 
417 in wrapper
     File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/sensor.py", line 227 
in execute
     File 
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/sensor.py", line 207 
in execute
     File
     
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/sensors/databricks_sql.py",
     line 135 in poke
     File
     
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/sensors/databricks_sql.py",
     line 126 in _get_results
     File
     
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
     line 288 in run
     File
     
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
     line 136 in get_conn
     File
     
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
     line 124 in _get_sql_endpoint_by_name
   
   Workaround
   
     Replace sql_warehouse_name with the explicit http_path parameter:
   
     # Fails — always raises "Can't list Databricks SQL endpoints":
     DatabricksSqlSensor(
         sql_warehouse_name="SQL Warehouse - abc - prod",
         sql="SELECT 1",
         ...
     )
   
     # Works — bypasses _get_sql_endpoint_by_name entirely:
     DatabricksSqlSensor(
         http_path="/sql/1.0/warehouses/abc",
         sql="SELECT 1",
         ...
     )
   
   ### What you think should happen instead
   
   
   
   aThe _get_sql_endpoint_by_name method should check for both "warehouses" and 
"endpoints" keys in the
      API response to handle both the new and legacy API response formats. 
Suggested fix:
   
     def _get_sql_endpoint_by_name(self, endpoint_name) -> dict[str, Any]:
         result = self._do_api_call(LIST_SQL_ENDPOINTS_ENDPOINT)
         endpoints = result.get("warehouses") or result.get("endpoints")
         if not endpoints:
             raise AirflowException("Can't list Databricks SQL 
warehouses/endpoints")
         try:
             endpoint = next(ep for ep in endpoints if ep["name"] == 
endpoint_name)
         except StopIteration:
             raise AirflowException(f"Can't find Databricks SQL warehouse with 
name '{endpoint_name}'")
         else:
             return endpoint
   
   ### How to reproduce
   
   1. Configure a Databricks connection in Airflow (any valid workspace)
     2. Create a DatabricksSqlSensor using sql_warehouse_name:
     DatabricksSqlSensor(
         task_id="test_sensor",
         databricks_conn_id="databricks_default",
         sql_warehouse_name="My SQL Warehouse",
         sql="SELECT 1",
     )
     3. Run the DAG — the sensor always fails with AirflowException: Can't list 
Databricks SQL endpoints
     4. This is 100% reproducible because the response key mismatch is 
deterministic
   
     Anything else
   
     - This affects all operators and sensors that use sql_warehouse_name / 
sql_endpoint_name to resolve
      a warehouse, including DatabricksSqlSensor, DatabricksPartitionSensor, 
and any usage of
     DatabricksSqlHook with the sql_endpoint_name parameter.
     - The LIST_SQL_ENDPOINTS_ENDPOINT constant was updated to use the new API 
path (2.0/sql/warehouses)
      but the response parsing was not updated to match the new response key 
("warehouses" instead of
     "endpoints").
     - Many production DAGs use sql_warehouse_name for readability and would 
all need to be refactored
     to use http_path as a workaround.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to