TobKed commented on a change in pull request #11726:
URL: https://github.com/apache/airflow/pull/11726#discussion_r517399496
##########
File path: airflow/providers/google/cloud/operators/dataflow.py
##########
@@ -324,6 +344,23 @@ class DataflowTemplatedJobStartOperator(BaseOperator):
`https://cloud.google.com/dataflow/pipelines/specifying-exec-params
<https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment>`__
:type environment: Optional[dict]
+ :param wait_until_finished: (Optional)
+ If True, wait for the end of pipeline execution before exiting. If
False,
+ it only waits for it to starts (``JOB_STATE_RUNNING``).
+
+ The default behavior depends on the type of pipeline:
+
+ * for the streaming pipeline, wait for jobs to start,
+ * for the batch pipeline, wait for the jobs to complete.
+
+ .. warning::
+
+ You cannot call ``PipelineResult.wait_until_finish`` method in
your pipeline code for the operator
Review comment:
Partially I agree, but partially not.
Default behaviour is:
- for the streaming pipeline, wait for jobs to start,
- for the batch pipeline, wait for the jobs to complete.
But there may be the specific cases like:
- user doesn't want to wait for the end of the batch job and knows for sure
that templated batch job not `wait_until_finish`
- user want to wait until streaming job will be cancelled/drained (e.g by
some external api call or web UI)
It will give possibility to conscious dataflow users for more flexible DAGs
if needed. What do you think about it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]