[GitHub] [airflow] josh-fell commented on a diff in pull request #30204: Databricks SQL Sensor

via GitHub Mon, 27 Mar 2023 11:53:45 -0700


josh-fell commented on code in PR #30204:
URL: https://github.com/apache/airflow/pull/30204#discussion_r1149663228



##########
tests/system/providers/databricks/example_databricks_sensor.py:
##########
@@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+import os
+import textwrap
+from datetime import datetime
+
+from airflow import DAG
+from airflow.providers.databricks.sensors.sql import DatabricksSqlSensor
+
+# [Env variable to be used from the OS]
+ENV_ID = os.environ.get("SYSTEM_TESTS_ENV_ID")
+# [DAG name to be shown on Airflow UI]
+DAG_ID = "example_databricks_sensor"
+
+with DAG(
+    dag_id=DAG_ID,
+    schedule="@daily",
+    start_date=datetime(2021, 1, 1),
+    tags=["example"],
+    catchup=False,
+) as dag:
+    dag.doc_md = textwrap.dedent(
+        """
+
+        This is an example DAG which uses the DatabricksSqlSensor
+        sensor. The example task in the DAG executes the provided
+        SQL query against the Databricks SQL warehouse and if a
+        result is returned, the sensor returns True/succeeds.
+        If no results are returned, the sensor returns False/
+        fails.
+
+        """
+    )
+    # [START howto_sensor_databricks_connection_setup]
+    # Connection string setup for Databricks workspace.
+    connection_id = "databricks_default"
+    sql_endpoint_name = "Starter Warehouse"
+    # [END howto_sensor_databricks_connection_setup]
+
+    # [START howto_sensor_databricks_sql]
+    # Example of using the Databricks SQL Sensor to check existence of 
data/partitions for a Delta table.
+    sql_sensor = DatabricksSqlSensor(
+        databricks_conn_id=connection_id,
+        sql_endpoint_name=sql_endpoint_name,
+        catalog="hive_metastore",
+        task_id="sql_sensor_task",
+        sql="select * from hive_metastore.temp.sample_table_3 limit 1",
+        timeout=60 * 2,
+    )
+    # [END howto_sensor_databricks_sql]
+
+    (sql_sensor)
+
+    from tests.system.utils.watcher import watcher
+
+    # This test needs watcher in order to properly mark success/failure
+    # when "tearDown" task with trigger rule is part of the DAG
+    list(dag.tasks) >> watcher()

Review Comment:
   ```suggestion
   ```
   The `(sql_sensor)` expression doesn't look necessary. Also, you don't the 
`watcher()` since this DAG only has 1 task and there are no tasks which are 
considered "teardown" tasks.



##########
airflow/providers/databricks/sensors/sql.py:
##########
@@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+"""This module contains Databricks sensors."""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Callable, Iterable, Sequence
+
+from airflow.compat.functools import cached_property
+from airflow.providers.common.sql.hooks.sql import fetch_all_handler
+from airflow.providers.databricks.hooks.databricks_sql import DatabricksSqlHook
+from airflow.sensors.base import BaseSensorOperator
+
+if TYPE_CHECKING:
+    from airflow.utils.context import Context
+
+
+class DatabricksSqlSensor(BaseSensorOperator):
+    """
+    Sensor to execute SQL statements on a Delta table via Databricks.
+
+        :param databricks_conn_id: Reference to :ref:`Databricks
+            connection id<howto/connection:databricks>` (templated), defaults 
to
+            DatabricksSqlHook.default_conn_name
+        :param http_path: Optional string specifying HTTP path of Databricks 
SQL Endpoint or cluster.
+            If not specified, it should be either specified in the Databricks 
connection's
+            extra parameters, or ``sql_endpoint_name`` must be specified.
+        :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If 
not specified, ``http_path``
+            must be provided as described above, defaults to None
+        :param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
+            it could be specified in the Databricks connection's extra 
parameters., defaults to None
+        :param http_headers: An optional list of (k, v) pairs
+            that will be set as HTTP headers on every request. (templated).
+        :param catalog: An optional initial catalog to use.
+            Requires DBR version 9.0+ (templated), defaults to ""
+        :param schema: An optional initial schema to use.
+            Requires DBR version 9.0+ (templated), defaults to "default"
+        :param sql: SQL statement to be executed.
+        :param handler: Handler for DbApiHook.run() to return results, 
defaults to fetch_all_handler
+        :param client_parameters: Additional parameters internal to Databricks 
SQL Connector parameters.

Review Comment:
   ```suggestion
       :param databricks_conn_id: Reference to :ref:`Databricks
           connection id<howto/connection:databricks>` (templated), defaults to
           DatabricksSqlHook.default_conn_name
       :param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
           If not specified, it should be either specified in the Databricks 
connection's
           extra parameters, or ``sql_endpoint_name`` must be specified.
       :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. If 
not specified, ``http_path``
           must be provided as described above, defaults to None
       :param session_configuration: An optional dictionary of Spark session 
parameters. If not specified,
           it could be specified in the Databricks connection's extra 
parameters., defaults to None
       :param http_headers: An optional list of (k, v) pairs
           that will be set as HTTP headers on every request. (templated).
       :param catalog: An optional initial catalog to use.
           Requires DBR version 9.0+ (templated), defaults to ""
       :param schema: An optional initial schema to use.
           Requires DBR version 9.0+ (templated), defaults to "default"
       :param sql: SQL statement to be executed.
       :param handler: Handler for DbApiHook.run() to return results, defaults 
to fetch_all_handler
       :param client_parameters: Additional parameters internal to Databricks 
SQL Connector parameters.
   ```
   The `param` directives need to match the left-alignment of the docstring 
otherwise the params won't render correctly in the Python API docs.



##########
tests/providers/databricks/sensors/test_sql.py:
##########


Review Comment:
   WDYT about adding a test for the interaction between `http_path` and 
`sql_endpoint_name` based on the sensor docstring:
   
   ```rst
   :param http_path: Optional string specifying HTTP path of Databricks SQL 
Endpoint or cluster.
               **If not specified, it should be either specified in the 
Databricks connection's
               extra parameters, or ``sql_endpoint_name`` must be specified.**
   :param sql_endpoint_name: Optional name of Databricks SQL Endpoint. **If not 
specified, ``http_path``
               must be provided as described above**, defaults to None
   ...
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] josh-fell commented on a diff in pull request #30204: Databricks SQL Sensor

Reply via email to