[GitHub] [incubator-dolphinscheduler] wangsvip opened a new issue #1307: 项目间任务依赖关系/Inter-project task dependencies

GitBox Thu, 21 Nov 2019 01:04:05 -0800

wangsvip opened a new issue #1307: 项目间任务依赖关系/Inter-project task dependencies
URL: https://github.com/apache/incubator-dolphinscheduler/issues/1307
 
 
   
   
举例：做数据仓库，就建立这一个账户，但是一个账户分了不同的小组再做，每个小组建立自己的项目，A小组的任务执行完触发B小组的任务，B再触发C小组的任务，这样做的好处就是避免了像之前A/B/C小组的任务都是定时任务，A：1点执行，B：2点执行，C：3点执行，如果说A的数据今天没过来，B到了2点执行了，C也到了3点执行了，没有数据这样执行不就是浪费集群资源么？
   
   
再举例：数据仓库分为ods、dw、dm三层，每一层都是一个项目，我在一个账户下分别建立这三个项目，每一层都由一批同事在做，我需要ods层拿到今天的数据然后触发dw层的清洗再触发dm层的计算，每一层都是环环相扣的，需要依赖的，而且是跨项目依赖，而不是你们推荐的那样，扭成一锅粥，非要建到一个工作流下，odsTask---->dwTast----->dmTask,这种做法不就乱套了么，ods的同事写完工作流，dw的同事再后面补上一层，dm的同事再补上一层？这显然是错的，一个公司里都是很多小组在做，像这种跨小组任务怎么可能扭到一个工作流里面。
   
   
目前解决方案：在dolphinscheduler没有任务依赖时，ods层数据进来的时候会在一个文件下建立状态文件，或者是mysql中建一张状态表来记录数据处理完的标识，然后下一层计算先去判断上一层输出的状态标志，再决定当前任务的执行与否。
   
   总结：公司都是有规矩的，都是分组做事，不可能把任务扭到一起！
   
   ================================================================
   
   Example: data warehouse, this can build up an account, but an account points 
to do A different group, each group set up their own project, A team task 
execution of the trigger B team tasks, and then trigger group C B task, the 
advantage is avoided as before A/B/C group tasks are timing task, A: 1, B: C: 
on 2 points, 3 points, if A data didn't come today, carried out by 2 PM, B C 
also carried out by 3 PM, no data that is not A waste of cluster resources?
   
   For example:Data warehouse is divided into ods, dw, dm three layers, each 
layer is a project, I respectively set up under an account this three projects, 
each layer by a group of colleagues in do, I need to get today's data and then 
trigger the ods layer dw cleaning to trigger the dm layer calculation, each 
layer is linked together, need to rely on, and it is cross project 
dependencies, rather than you recommended, twist a mess, have to build to a 
working flow, odsTask -- -- -- -- > dwTast -- -- -- -- -- > dmTask, this action 
is not were mixed, ods colleagues to finish the workflow,Dw colleagues add 
another layer, dm colleagues add another layer?This is obviously wrong, as 
there are many teams in a company, how can a cross-team task like this be 
twisted into a workflow.
   
   Current solution: when dolphinscheduler does not have a task dependency, the 
ods layer will create a status file in a file when the ods layer data comes in, 
or a status table in mysql to record the status of the data processed, and the 
next layer will determine the status of the previous layer before deciding 
whether the current task is executed or not.
   
   Conclusion: the company has rules, are working in groups, impossible to 
twist the task together!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-dolphinscheduler] wangsvip opened a new issue #1307: 项目间任务依赖关系/Inter-project task dependencies

Reply via email to