jaketf commented on a change in pull request #8550:
URL: https://github.com/apache/airflow/pull/8550#discussion_r427500482



##########
File path: airflow/providers/google/cloud/hooks/dataflow.py
##########
@@ -583,6 +623,49 @@ def start_template_dataflow(
         jobs_controller.wait_for_done()
         return response["job"]
 
+    @GoogleBaseHook.fallback_to_default_project_id
+    def start_flex_template(
+        self,
+        body: Dict,
+        location: str,
+        project_id: str,
+        on_new_job_id_callback: Optional[Callable[[str], None]] = None
+    ):
+        """
+        Starts flex templates with the Dataflow  pipeline.
+
+        :param body: The request body
+        :param location: The location of the Dataflow job (for example 
europe-west1)
+        :type location: str
+        :param project_id: The ID of the GCP project that owns the job.
+            If set to ``None`` or missing, the default project_id from the GCP 
connection is used.
+        :type project_id: Optional[str]
+        :param on_new_job_id_callback: A callback that is called when a Job ID 
is detected.
+        :return: the Job
+        """
+        service = self.get_conn()
+        request = service.projects().locations().flexTemplates().launch(  # 
pylint: disable=no-member
+            projectId=project_id,
+            body=body,
+            location=location
+        )
+        response = request.execute(num_retries=self.num_retries)
+        job_id = response['job']['id']
+
+        if on_new_job_id_callback:
+            on_new_job_id_callback(job_id)
+
+        jobs_controller = _DataflowJobsController(
+            dataflow=self.get_conn(),
+            project_number=project_id,
+            job_id=job_id,
+            location=location,
+            poll_sleep=self.poll_sleep,
+            num_retries=self.num_retries)
+        jobs_controller.wait_for_done()

Review comment:
       @mik-laj was reflecting on this in light of the data fusion operator 
issue.
   This is "start" naming confusing.
   
   This method (and Dataflow*Start*FlexTemplateOperator) are called "start" 
flex template but this appears like you are waiting for the job to complete.
   
   The existing dataflow operators do not have this start word and I think the 
user expectation is that they poll the job to completion. Otherwise you can't 
do much useful downstream in the DAG without having some sensor that waits on 
this job completion.
   
   If we want to support blocking or not blocking I'd suggest having a 
`wait_for_done` kwarg that defaults to `True` (the expected behavior based on 
similar operators). This might mean that we need a new method in the controller 
`wait_for_running` that blocks until the pipeline enters the RUNNING state.
   
   What do you think?
   
   same applies for #8553 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to