+1 This feature is more like python SDK, do we need to create a new repository to maintain?
Jiajie Zhong <[email protected]> 于2021年9月28日周二 上午11:42写道: > Hey guys, > > Apache DolphinScheduler is a good tool for workflow scheduler, it’s > easy-to-extend, > distributed and have nice UI to create and maintain workflow. Our workflow > only support > define in UI, which is easy to use and user friendly, it’s good but could > be batter by > adding extend API and make workflow could define as code or yaml file. And > consider yaml > file it’s hard to maintain manually I think it better to use code to > define it, aka workflows-as-code. > > When workflow definitions as code, we could easy to modify some > configure and do > some batch change for it. It’s could more easy to define similar task by > loop statement, > and it give ability adding unittest for workflow too. I hope Apache > DolphinScheduler could > combine the benefit of define by code and by UI, so I raise proposal for > adding > workflows-as-code to Apache DolphinScheduler. > > Actually, I already start it by adding POC PR[1]. In this PR, I adding > Python API give > user define workflow by Python code. This feature use *Py4J* connect Java > and Python, > which mean I never add any new database model and infra to Apache > DolphinScheduler, > I just reuse layer service in dolphinscheduler-api package to create > workflow. And we could > consider Python API just another interface for Apache DolphinScheduler, > just like our UI, it > allow we define and maintain workflow follow their rule. > > Here it’s an tutorial workflow definitions by Python API, which you > could find it in PR file[2] > > ```python > from pydolphinscheduler.core.process_definition import ProcessDefinition > from pydolphinscheduler.tasks.shell import Shell > > with ProcessDefinition(name="tutorial") as pd: > task_parent = Shell(name="task_parent", command="echo hello > pydolphinscheduler") > task_child_one = Shell(name="task_child_one", command="echo 'child > one'") > task_child_two = Shell(name="task_child_two", command="echo 'child > two'") > task_union = Shell(name="task_union", command="echo union") > > task_group = [task_child_one, task_child_two] > task_parent.set_downstream(task_group) > > task_union << task_group > > pd.run() > ``` > > In tutorial, we define a new ProcessDefinition named ‘tutorial’ using > python context, > and then we add four Shell tasks to ‘tutorial’, just five line we could > create one process > definition with four tasks. > Beside process definition and tasks, another think we have to > add to workflow it’s task dependent, we add function `set_downstream` and > `set_upstream` > to describe task dependent. At the same time, we overwrite bit operator > and add a shortcut > `>>` and `<<` to do it. > After dependent set, we done our workflow definition, but all > definition are in Python API > side, which mean it not persist to Apache DolphinScheduler database, and > it could not runs > by Apache DolphinScheduler until declare `pd.submit()` or directly run it > by `pd.run()` > > > [1]: https://github.com/apache/dolphinscheduler/pull/6269 < > https://github.com/apache/dolphinscheduler/pull/6269> > [2]: > https://github.com/apache/dolphinscheduler/pull/6269/files#diff-5561fec6b57cc611bee2b0d8f030965d76bdd202801d9f8a1e2e74c21769bc41 > < > https://github.com/apache/dolphinscheduler/pull/6269/files#diff-5561fec6b57cc611bee2b0d8f030965d76bdd202801d9f8a1e2e74c21769bc41 > > > > > Best Wish > — Jiajie > > > >
