jaketf commented on a change in pull request #8550:
URL: https://github.com/apache/airflow/pull/8550#discussion_r427500482
##########
File path: airflow/providers/google/cloud/hooks/dataflow.py
##########
@@ -583,6 +623,49 @@ def start_template_dataflow(
jobs_controller.wait_for_done()
return response["job"]
+ @GoogleBaseHook.fallback_to_default_project_id
+ def start_flex_template(
+ self,
+ body: Dict,
+ location: str,
+ project_id: str,
+ on_new_job_id_callback: Optional[Callable[[str], None]] = None
+ ):
+ """
+ Starts flex templates with the Dataflow pipeline.
+
+ :param body: The request body
+ :param location: The location of the Dataflow job (for example
europe-west1)
+ :type location: str
+ :param project_id: The ID of the GCP project that owns the job.
+ If set to ``None`` or missing, the default project_id from the GCP
connection is used.
+ :type project_id: Optional[str]
+ :param on_new_job_id_callback: A callback that is called when a Job ID
is detected.
+ :return: the Job
+ """
+ service = self.get_conn()
+ request = service.projects().locations().flexTemplates().launch( #
pylint: disable=no-member
+ projectId=project_id,
+ body=body,
+ location=location
+ )
+ response = request.execute(num_retries=self.num_retries)
+ job_id = response['job']['id']
+
+ if on_new_job_id_callback:
+ on_new_job_id_callback(job_id)
+
+ jobs_controller = _DataflowJobsController(
+ dataflow=self.get_conn(),
+ project_number=project_id,
+ job_id=job_id,
+ location=location,
+ poll_sleep=self.poll_sleep,
+ num_retries=self.num_retries)
+ jobs_controller.wait_for_done()
Review comment:
@mik-laj was reflecting on this in light of the data fusion operator
issue.
This is "start" naming confusing.
This method (and Dataflow*Start*FlexTemplateOperator) are called "start"
flex template but this appears like you are waiting for the job to complete.
The existing dataflow operators do not have this start word and I think the
user expectation is that they poll the job to completion. Otherwise you can't
do much useful downstream in the DAG without having some sensor that waits on
this job completion.
If we want to support blocking or not blocking I'd suggest having a
`wait_for_done` kwarg that defaults to `True` (the expected behavior based on
similar operators). This might mean that we need a new method in the controller
`wait_for_running` that blocks until the pipeline enters the RUNNING state.
What do you think?
same applies for #8553
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]