[superset] branch master updated: feat: Databricks native driver (#20320)

beto Thu, 09 Jun 2022 15:35:41 -0700

This is an automated email from the ASF dual-hosted git repository.

beto pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/superset.git



The following commit(s) were added to refs/heads/master by this push:
     new ec331e683e feat: Databricks native driver (#20320)
ec331e683e is described below

commit ec331e683e03e2422e956729f3f32a2442f7d82c
Author: Beto Dealmeida <[email protected]>
AuthorDate: Thu Jun 9 15:34:49 2022 -0700

    feat: Databricks native driver (#20320)
---
 docs/docs/databases/databricks.mdx     | 49 +++++++++++++++++++++++++---------
 setup.py                               |  5 +++-
 superset/db_engine_specs/databricks.py |  6 +++++
 3 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/docs/docs/databases/databricks.mdx 
b/docs/docs/databases/databricks.mdx
index 9c8ddafebd..4070960ce2 100644
--- a/docs/docs/databases/databricks.mdx
+++ b/docs/docs/databases/databricks.mdx
@@ -7,16 +7,12 @@ version: 1
 
 ## Databricks
 
-To connect to Databricks, first install 
[databricks-dbapi](https://pypi.org/project/databricks-dbapi/) with the 
optional SQLAlchemy dependencies:
+Databricks now offer a native DB API 2.0 driver, `databricks-sql-connector`, 
that can be used with the `sqlalchemy-databricks` dialect. You can install both 
with:
 
 ```bash
-pip install databricks-dbapi[sqlalchemy]
+pip install "superset[databricks]"
 ```
 
-There are two ways to connect to Databricks: using a Hive connector or an ODBC 
connector. Both ways work similarly, but only ODBC can be used to connect to 
[SQL endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
-
-### Hive
-
 To use the Hive connector you need the following information from your cluster:
 
 - Server hostname
@@ -27,15 +23,44 @@ These can be found under "Configuration" -> "Advanced 
Options" -> "JDBC/ODBC".
 
 You also need an access token from "Settings" -> "User Settings" -> "Access 
Tokens".
 
-Once you have all this information, add a database of type "Databricks (Hive)" 
in Superset, and use the following SQLAlchemy URI:
+Once you have all this information, add a database of type "Databricks Native 
Connector" and use the following SQLAlchemy URI:
 
 ```
-databricks+pyhive://token:{access token}@{server hostname}:{port}/{database 
name}
+databricks+connector://token:{access_token}@{server_hostname}:{port}/{database_name}
 ```
 
 You also need to add the following configuration to "Other" -> "Engine 
Parameters", with your HTTP path:
 
+```json
+{
+    "connect_args": {"http_path": "sql/protocolv1/o/****"},
+    "http_headers": [["User-Agent", "Apache Superset"]]
+}
 ```
+
+The `User-Agent` header is optional, but helps Databricks identify traffic 
from Superset. If you need to use a different header please reach out to 
Databricks and let them know.
+
+## Older driver
+
+Originally Superset used `databricks-dbapi` to connect to Databricks. You 
might want to try it if you're having problems with the official Databricks 
connector:
+
+```bash
+pip install "databricks-dbapi[sqlalchemy]"
+```
+
+There are two ways to connect to Databricks when using `databricks-dbapi`: 
using a Hive connector or an ODBC connector. Both ways work similarly, but only 
ODBC can be used to connect to [SQL 
endpoints](https://docs.databricks.com/sql/admin/sql-endpoints.html).
+
+### Hive
+
+To connect to a Hive cluster add a database of type "Databricks Interactive 
Cluster" in Superset, and use the following SQLAlchemy URI:
+
+```
+databricks+pyhive://token:{access_token}@{server_hostname}:{port}/{database_name}
+```
+
+You also need to add the following configuration to "Other" -> "Engine 
Parameters", with your HTTP path:
+
+```json
 {"connect_args": {"http_path": "sql/protocolv1/o/****"}}
 ```
 
@@ -43,15 +68,15 @@ You also need to add the following configuration to "Other" 
-> "Engine Parameter
 
 For ODBC you first need to install the [ODBC drivers for your 
platform](https://databricks.com/spark/odbc-drivers-download).
 
-For a regular connection use this as the SQLAlchemy URI:
+For a regular connection use this as the SQLAlchemy URI after selecting either 
"Databricks Interactive Cluster" or "Databricks SQL Endpoint" for the database, 
depending on your use case:
 
 ```
-databricks+pyodbc://token:{access token}@{server hostname}:{port}/{database 
name}
+databricks+pyodbc://token:{access_token}@{server_hostname}:{port}/{database_name}
 ```
 
 And for the connection arguments:
 
-```
+```json
 {"connect_args": {"http_path": "sql/protocolv1/o/****", "driver_path": 
"/path/to/odbc/driver"}}
 ```
 
@@ -62,6 +87,6 @@ The driver path should be:
 
 For a connection to a SQL endpoint you need to use the HTTP path from the 
endpoint:
 
-```
+```json
 {"connect_args": {"http_path": "/sql/1.0/endpoints/****", "driver_path": 
"/path/to/odbc/driver"}}
 ```
diff --git a/setup.py b/setup.py
index 556e170985..28a220271b 100644
--- a/setup.py
+++ b/setup.py
@@ -129,7 +129,10 @@ setup(
         "cockroachdb": ["cockroachdb>=0.3.5, <0.4"],
         "cors": ["flask-cors>=2.0.0"],
         "crate": ["crate[sqlalchemy]>=0.26.0, <0.27"],
-        "databricks": ["databricks-dbapi[sqlalchemy]>=0.5.0, <0.6"],
+        "databricks": [
+            "databricks-sql-connector>=2.0.2, <3",
+            "sqlalchemy-databricks>=0.2.0",
+        ],
         "db2": ["ibm-db-sa>=0.3.5, <0.4"],
         "dremio": ["sqlalchemy-dremio>=1.1.5, <1.3"],
         "drill": ["sqlalchemy-drill==0.1.dev"],
diff --git a/superset/db_engine_specs/databricks.py 
b/superset/db_engine_specs/databricks.py
index f5f46bf491..d010b520d0 100644
--- a/superset/db_engine_specs/databricks.py
+++ b/superset/db_engine_specs/databricks.py
@@ -65,3 +65,9 @@ class DatabricksODBCEngineSpec(BaseEngineSpec):
     @classmethod
     def epoch_to_dttm(cls) -> str:
         return HiveEngineSpec.epoch_to_dttm()
+
+
+class DatabricksNativeEngineSpec(DatabricksODBCEngineSpec):
+    engine = "databricks"
+    engine_name = "Databricks Native Connector"
+    driver = "connector"

[superset] branch master updated: feat: Databricks native driver (#20320)

Reply via email to