ataulmujeeb-cyber opened a new issue, #63285:
URL: https://github.com/apache/airflow/issues/63285
### Apache Airflow Provider(s)
databricks
### Versions of Apache Airflow Providers
- `apache-airflow-providers-databricks==7.9.1` (confirmed), also likely
affects up to latest
`7.10.0`
### Apache Airflow version
3.1-11
### Operating System
ubuntu-latest
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Astro Runtime 3.1-11
### What happened
The bug is in
[`DatabricksSqlHook._get_sql_endpoint_by_name()`](https://github.com/apache/airflow/b
lob/main/providers/databricks/src/airflow/providers/databricks/hooks/databricks_sql.py#L121-L130):
```python
def _get_sql_endpoint_by_name(self, endpoint_name) -> dict[str, Any]:
result = self._do_api_call(LIST_SQL_ENDPOINTS_ENDPOINT)
if "endpoints" not in result: # <--- BUG HERE
raise AirflowException("Can't list Databricks SQL endpoints")
try:
endpoint = next(endpoint for endpoint in result["endpoints"] if
endpoint["name"] ==
endpoint_name)
except StopIteration:
raise AirflowException(f"Can't find Databricks SQL endpoint with
name '{endpoint_name}'")
else:
return endpoint
The LIST_SQL_ENDPOINTS_ENDPOINT constant is defined as:
LIST_SQL_ENDPOINTS_ENDPOINT = ("GET", "2.0/sql/warehouses")
This means the hook calls the new Databricks REST API path GET
/api/2.0/sql/warehouses. However,
the response key changed when Databricks renamed "SQL endpoints" to "SQL
warehouses" (in 2023):
┌───────────────────────────────────────┬───────────────────┐
│ API Path │ Response JSON Key │
├───────────────────────────────────────┼───────────────────┤
│ GET /api/2.0/sql/endpoints (legacy) │ "endpoints" │
├───────────────────────────────────────┼───────────────────┤
│ GET /api/2.0/sql/warehouses (current) │ "warehouses" │
└───────────────────────────────────────┴───────────────────┘
The code calls the new path (2.0/sql/warehouses) but checks for the old
response key ("endpoints").
Since the new API returns "warehouses" as the top-level key, the check if
"endpoints" not in
result is always true, and the method always raises
AirflowException("Can't list Databricks SQL
endpoints").
Full Traceback
AirflowException: Can't list Databricks SQL endpoints
File
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
line 1004
in run
File
"/usr/local/lib/python3.12/site-packages/airflow/sdk/execution_time/task_runner.py",
line 1405
in _execute_task
File
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/operator.py", line
417 in wrapper
File
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/sensor.py", line 227
in execute
File
"/usr/local/lib/python3.12/site-packages/airflow/sdk/bases/sensor.py", line 207
in execute
File
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/sensors/databricks_sql.py",
line 135 in poke
File
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/sensors/databricks_sql.py",
line 126 in _get_results
File
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
line 288 in run
File
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
line 136 in get_conn
File
"/usr/local/lib/python3.12/site-packages/airflow/providers/databricks/hooks/databricks_sql.py",
line 124 in _get_sql_endpoint_by_name
Workaround
Replace sql_warehouse_name with the explicit http_path parameter:
# Fails — always raises "Can't list Databricks SQL endpoints":
DatabricksSqlSensor(
sql_warehouse_name="SQL Warehouse - abc - prod",
sql="SELECT 1",
...
)
# Works — bypasses _get_sql_endpoint_by_name entirely:
DatabricksSqlSensor(
http_path="/sql/1.0/warehouses/abc",
sql="SELECT 1",
...
)
### What you think should happen instead
aThe _get_sql_endpoint_by_name method should check for both "warehouses" and
"endpoints" keys in the
API response to handle both the new and legacy API response formats.
Suggested fix:
def _get_sql_endpoint_by_name(self, endpoint_name) -> dict[str, Any]:
result = self._do_api_call(LIST_SQL_ENDPOINTS_ENDPOINT)
endpoints = result.get("warehouses") or result.get("endpoints")
if not endpoints:
raise AirflowException("Can't list Databricks SQL
warehouses/endpoints")
try:
endpoint = next(ep for ep in endpoints if ep["name"] ==
endpoint_name)
except StopIteration:
raise AirflowException(f"Can't find Databricks SQL warehouse with
name '{endpoint_name}'")
else:
return endpoint
### How to reproduce
1. Configure a Databricks connection in Airflow (any valid workspace)
2. Create a DatabricksSqlSensor using sql_warehouse_name:
DatabricksSqlSensor(
task_id="test_sensor",
databricks_conn_id="databricks_default",
sql_warehouse_name="My SQL Warehouse",
sql="SELECT 1",
)
3. Run the DAG — the sensor always fails with AirflowException: Can't list
Databricks SQL endpoints
4. This is 100% reproducible because the response key mismatch is
deterministic
Anything else
- This affects all operators and sensors that use sql_warehouse_name /
sql_endpoint_name to resolve
a warehouse, including DatabricksSqlSensor, DatabricksPartitionSensor,
and any usage of
DatabricksSqlHook with the sql_endpoint_name parameter.
- The LIST_SQL_ENDPOINTS_ENDPOINT constant was updated to use the new API
path (2.0/sql/warehouses)
but the response parsing was not updated to match the new response key
("warehouses" instead of
"endpoints").
- Many production DAGs use sql_warehouse_name for readability and would
all need to be refactored
to use http_path as a workaround.
### Anything else
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]