josh-fell commented on a change in pull request #19248:
URL: https://github.com/apache/airflow/pull/19248#discussion_r737561941



##########
File path: airflow/providers/google/cloud/operators/dataproc.py
##########
@@ -2159,3 +2160,311 @@ def execute(self, context: Dict):
         )
         operation.result()
         self.log.info("Updated %s cluster.", self.cluster_name)
+
+
+class DataprocCreateBatchOperator(BaseOperator):
+    """
+    Creates a batch workload.
+
+    :param project_id: Required. The ID of the Google Cloud project that the 
cluster belongs to.
+    :type project_id: str
+    :param region: Required. The Cloud Dataproc region in which to handle the 
request.
+    :type region: str
+    :param batch: Required. The batch to create.
+    :type batch: google.cloud.dataproc_v1.types.Batch
+    :param batch_id: Optional. The ID to use for the batch, which will become 
the final component
+        of the batch's resource name.
+        This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/.
+    :type batch_id: str
+    :param request_id: Optional. A unique id used to identify the request. If 
the server receives two
+        ``CreateBatchRequest`` requests with the same id, then the second 
request will be ignored and
+        the first ``google.longrunning.Operation`` created and stored in the 
backend is returned.
+    :type request_id: str
+    :param retry: A retry object used to retry requests. If ``None`` is 
specified, requests will not be
+        retried.
+    :type retry: google.api_core.retry.Retry
+    :param timeout: The amount of time, in seconds, to wait for the request to 
complete. Note that if
+        ``retry`` is specified, the timeout applies to each individual attempt.
+    :type timeout: float
+    :param metadata: Additional metadata that is provided to the method.
+    :type metadata: Sequence[Tuple[str, str]]
+    """

Review comment:
       Missing `gcp_conn_id` and `impersonation_chain` from the docstring.

##########
File path: airflow/providers/google/cloud/example_dags/example_dataproc.py
##########
@@ -252,3 +263,40 @@
 
     # Task dependency created via `XComArgs`:
     #   spark_task_async >> spark_task_async_sensor
+
+with models.DAG(
+    "example_gcp_batch_dataproc",
+    schedule_interval='@once',
+    start_date=days_ago(1),

Review comment:
       ```suggestion
       "example_gcp_batch_dataproc",
       schedule_interval='@once',
       start_date=datetime(2021, 1, 1),
       catchup=False,
   ```
   
   There is an almost-finished effort to transition away from using 
start_date=days_ago(n) in example DAGs to using a static datetime(...) value as 
best practice. New example DAGs should follow the static start_date approach. 
The value used doesn't matter as long as it's static.
   
   Also adding `catchup=False` has been recently discussed as an addition for 
all example DAGs to help with any accidental DAG run explosions if new users 
mutate the `schedule_interval` without fully understanding `catchup=True` is 
the default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to