Hi! When the upstream task executes the complement data, the downstream task has two trigger mechanisms. 1.After the upstream task is successfully executed, the downstream task is automatically triggered to continue to execute the complement data.The downstream triggering action is determined by the upstream complement data task. 2.The downstream task supports the configuration of whether to follow the upstream execution complement. The downstream triggering action is determined by the downstream task itself.
I personally support the second mechanism, The reason: 1.On the issue of authority and responsibility, the triggering and execution of a task should be decided by the task leader, not by the upstream, because the upstream does not know the execution logic of the downstream task. In fact, upstream can not guarantee that all downstream can perform as expected. 2.Some tasks cannot be re run at any time, such as hive2mysql / hive2doris. Re running tasks will cause no data in the downstream for a period of time. 3.For example, downstream of task A are tasks B、C、D, Upstream of task E are B、C、D. In this case, the result of task E is unpredictable, it may run repeatedly or only once, but the data is wrong. Of course, the task rerun configuration can support multiple strategies, such as follow execution, notification, and ignore ... In the hierarchical structure of data warehouse, I think that only the bottom table of the fact layer with simple dependency can be configured to follow the execution. It is not recommended to configure the following execution for the top table with more dependency such as topic layer or app layer, because the result is difficult to predict. These are my personal views. Welcome to discuss. -------------------- Apache DolphinScheduler Commtter Hemin Wen 温合民 [email protected] --------------------
