pankajkoti commented on code in PR #45188:
URL: https://github.com/apache/airflow/pull/45188#discussion_r1900587419
##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -292,6 +292,8 @@ class DatabricksCreateJobsOperator(BaseOperator):
:param databricks_retry_delay: Number of seconds to wait between retries
(it
might be a floating point number).
:param databricks_retry_args: An optional dictionary with arguments passed
to ``tenacity.Retrying`` class.
+ :param databricks_environments: An optional list of task execution
environment specifications
Review Comment:
can we not call this `environments` only? the prefix databricks seems
redundant. same suggestion across the changes in the PR.
##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1087,7 +1100,9 @@ def _get_run_json(self) -> dict[str, Any]:
elif self.existing_cluster_id:
run_json["existing_cluster_id"] = self.existing_cluster_id
else:
- raise ValueError("Must specify either existing_cluster_id or
new_cluster.")
Review Comment:
IMO, let's keep else branch and the error with altering the message to say
that either of existing_cluster_id, new_cluster spec or databricks_environments
needs to be specified by also adding another `elif` block to verify that
self.databricks_environment is set.
##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1087,7 +1100,9 @@ def _get_run_json(self) -> dict[str, Any]:
elif self.existing_cluster_id:
run_json["existing_cluster_id"] = self.existing_cluster_id
else:
- raise ValueError("Must specify either existing_cluster_id or
new_cluster.")
+ self.log.info("The task %s will be executed in serverless mode",
run_json["run_name"])
+ if self.databricks_environments:
Review Comment:
this could be moved as an elif block before the above else block.
##########
providers/src/airflow/providers/databricks/operators/databricks.py:
##########
@@ -1372,27 +1387,29 @@ def _convert_to_databricks_workflow_task(
class DatabricksTaskOperator(DatabricksTaskBaseOperator):
"""
- Runs a task on Databricks using an Airflow operator.
-
- The DatabricksTaskOperator allows users to launch and monitor task job
runs on Databricks as Airflow
- tasks. It can be used as a part of a DatabricksWorkflowTaskGroup to take
advantage of job clusters, which
- allows users to run their tasks on cheaper clusters that can be shared
between tasks.
+ Runs a task on Databricks using an Airflow operator.
- .. seealso::
- For more information on how to use this operator, take a look at the
guide:
- :ref:`howto/operator:DatabricksTaskOperator`
+ The DatabricksTaskOperator allows users to launch and monitor task job
runs on Databricks as Airflow
Review Comment:
why are we changing the indent here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]