Re: [PR] Added support for serverless job in Databricks operators [airflow]

via GitHub Wed, 01 Jan 2025 23:02:55 -0800


pankajkoti commented on code in PR #45188:
URL: https://github.com/apache/airflow/pull/45188#discussion_r1900587419



##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -292,6 +292,8 @@ class DatabricksCreateJobsOperator(BaseOperator):
     :param databricks_retry_delay: Number of seconds to wait between retries 
(it
             might be a floating point number).
     :param databricks_retry_args: An optional dictionary with arguments passed 
to ``tenacity.Retrying`` class.
+    :param databricks_environments: An optional list of task execution 
environment specifications

Review Comment:
   can we not call this `environments` only? the prefix databricks seems 
redundant. same suggestion across the changes in the PR.



##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1087,7 +1100,9 @@ def _get_run_json(self) -> dict[str, Any]:
         elif self.existing_cluster_id:
             run_json["existing_cluster_id"] = self.existing_cluster_id
         else:
-            raise ValueError("Must specify either existing_cluster_id or 
new_cluster.")

Review Comment:
   IMO, let's keep else branch and the error with altering the message to say 
that either of existing_cluster_id, new_cluster spec or databricks_environments 
needs to be specified by also adding another `elif` block to verify that 
self.databricks_environment is set.



##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1087,7 +1100,9 @@ def _get_run_json(self) -> dict[str, Any]:
         elif self.existing_cluster_id:
             run_json["existing_cluster_id"] = self.existing_cluster_id
         else:
-            raise ValueError("Must specify either existing_cluster_id or 
new_cluster.")
+            self.log.info("The task %s will be executed in serverless mode", 
run_json["run_name"])
+        if self.databricks_environments:

Review Comment:
   this could be moved as an elif block before the above else block.



##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1372,27 +1387,29 @@ def _convert_to_databricks_workflow_task(
 
 class DatabricksTaskOperator(DatabricksTaskBaseOperator):
     """
-    Runs a task on Databricks using an Airflow operator.
-
-    The DatabricksTaskOperator allows users to launch and monitor task job 
runs on Databricks as Airflow
-    tasks. It can be used as a part of a DatabricksWorkflowTaskGroup to take 
advantage of job clusters, which
-    allows users to run their tasks on cheaper clusters that can be shared 
between tasks.
+        Runs a task on Databricks using an Airflow operator.
 
-    .. seealso::
-        For more information on how to use this operator, take a look at the 
guide:
-        :ref:`howto/operator:DatabricksTaskOperator`
+        The DatabricksTaskOperator allows users to launch and monitor task job 
runs on Databricks as Airflow

Review Comment:
   why are we changing the indent here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Added support for serverless job in Databricks operators [airflow]

Reply via email to