Hi!

When the upstream task executes the complement data, the downstream task
has two trigger mechanisms.
1.After the upstream task is successfully executed, the downstream task is
automatically triggered to continue to execute the complement data.The
downstream triggering action is determined by the upstream complement data
task.
2.The downstream task supports the configuration of whether to follow the
upstream execution complement. The downstream triggering action is
determined by the downstream task itself.

I personally support the second mechanism, The reason:
1.On the issue of authority and responsibility, the triggering and
execution of a task should be decided by the task leader, not by the
upstream, because the upstream does not know the execution logic of the
downstream task. In fact, upstream can not guarantee that all downstream
can perform as expected.
2.Some tasks cannot be re run at any time, such as hive2mysql / hive2doris.
Re running tasks will cause no data in the downstream for a period of time.
3.For example, downstream of task A are tasks B、C、D, Upstream of task E are
B、C、D. In this case, the result of task E is unpredictable, it may run
repeatedly or only once, but the data is wrong.

Of course, the task rerun configuration can support multiple strategies,
such as follow execution, notification, and ignore ...

In the hierarchical structure of data warehouse, I think that only the
bottom table of the fact layer with simple dependency can be configured to
follow the execution. It is not recommended to configure the following
execution for the top table with more dependency such as topic layer or app
layer, because the result is difficult to predict.

These are my personal views.
Welcome to discuss.

--------------------
Apache DolphinScheduler Commtter
Hemin Wen  温合民
[email protected]
--------------------

Reply via email to