freget commented on a change in pull request #19335:
URL: https://github.com/apache/airflow/pull/19335#discussion_r744172643
##########
File path: docs/apache-airflow-providers-databricks/connections/databricks.rst
##########
@@ -57,10 +58,19 @@ Password (optional)
Extra (optional)
Specify the extra parameter (as json dictionary) that can be used in the
Databricks connection.
- This parameter is necessary if using the *PAT* authentication method
(recommended):
+
+ Following parameter is necessary if using the *PAT* authentication method
(recommended):
* ``token``: Specify PAT to use.
+ Following parameters are necessary if using authentication with AAD token:
+
+ * ``azure_client_id``: ID of the Azure Service Principal
Review comment:
Why don't we put the client_id and client_secret into the user and
password fields of the connection? Ultimately, it's just a user and a password.
##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -168,13 +174,45 @@ def _do_api_call(self, endpoint_info, json):
self.databricks_conn = self.get_connection(self.databricks_conn_id)
+ headers = USER_AGENT_HEADER.copy()
+
if 'token' in self.databricks_conn.extra_dejson:
self.log.info('Using token auth. ')
auth = _TokenAuth(self.databricks_conn.extra_dejson['token'])
if 'host' in self.databricks_conn.extra_dejson:
host =
self._parse_host(self.databricks_conn.extra_dejson['host'])
else:
host = self.databricks_conn.host
+ elif (
+ 'azure_client_id' in self.databricks_conn.extra_dejson
+ and 'azure_client_secret' in self.databricks_conn.extra_dejson
+ and 'azure_tenant_id' in self.databricks_conn.extra_dejson
Review comment:
Enforcing all three values to be present may result in unexpected
behavior. Maybe we should raise an Exception if at least one, but not all three
azure_* configs are set?
##########
File path: setup.py
##########
@@ -244,6 +244,7 @@ def write_version(filename: str = os.path.join(*[my_dir,
"airflow", "git_version
]
databricks = [
'requests>=2.26.0, <3',
+ 'azure-identity>=1.3.1',
Review comment:
In #19433 I deliberately omitted adding dependencies for azure specific
libraries. AWS users do not need this dependency - however, using plain rest
API calls does not yield in as clean code as with azure-identity. I don't have
a clear opinion on what is better...
##########
File path: airflow/providers/databricks/hooks/databricks.py
##########
@@ -168,13 +174,45 @@ def _do_api_call(self, endpoint_info, json):
self.databricks_conn = self.get_connection(self.databricks_conn_id)
+ headers = USER_AGENT_HEADER.copy()
+
if 'token' in self.databricks_conn.extra_dejson:
self.log.info('Using token auth. ')
auth = _TokenAuth(self.databricks_conn.extra_dejson['token'])
if 'host' in self.databricks_conn.extra_dejson:
host =
self._parse_host(self.databricks_conn.extra_dejson['host'])
else:
host = self.databricks_conn.host
+ elif (
+ 'azure_client_id' in self.databricks_conn.extra_dejson
+ and 'azure_client_secret' in self.databricks_conn.extra_dejson
+ and 'azure_tenant_id' in self.databricks_conn.extra_dejson
+ ):
+ self.log.info('Using AAD Token for SPN. ')
+
+ if 'host' in self.databricks_conn.extra_dejson:
+ host =
self._parse_host(self.databricks_conn.extra_dejson['host'])
Review comment:
To be honest I did not understand why `host` is allowed as an extra
variable for token authentication - I suspected something about backward
compatibility. For user / password authentication, `host` may only be submitted
as a normal connection argument. I would prefer to do it the same way with SP
based authentication.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]