hi, DolphinScheduler community: We discussed the impact of DolphinScheduler on performance, scalability and stability at 19:00, 2020-11-11, Beijing time. A total of 20+ partners participated in this meeting. The discussion results of the meeting are as follows: 1: [SPI related design and split determination] Each module provides plug-in support to improve the scalability of each module. Such as registration center, alert center (coming soon), global queue, log, task, etc. @gaojun_2048 told us about the design of SPI: Keep the SPI design simple and don't rely on too many third-party JARs. Each plug-in remains independent, and there may be some redundant code. In this case, each tool is written.
At the same time, I am very grateful to @baoxin company for sharing the transformation of the task plugin in their actual application. It coincides with our task plugin SPI. After DS task plugin SPI is implemented based on SPI shell task plugin, the task plugins provided by @baoxin are all Run through this shell task type plugin. At the same time, other task types can be implemented based on the SPI community, such as ordinary java tasks, http tasks, mr tasks and so on. In addition, @gaojun-2048 mentioned that the DS design should be as concise as possible, and demented design for users, reducing the cost of user understanding. email discuss link:https://lists.apache.org/thread.html/r6ea7be489040c90dc14d22352a65ef7dfd67a1ff3145a8b9e7b72f69%40%3Cdev.dolphinscheduler.apache.org%3E 2: (TBD) According to the discussion of sql dynamically generating DAG, thank you very much for the design and suggestions of @Rubik-W, but the related design still needs to be improved, such as the completeness of the function and the versatility. Future mail discussion: https://lists.apache.org/thread.html/r7dc0647f1e66a688dcfb262cd438efde5367d801ad469d12f5f772ba%40%3Cdev.dolphinscheduler.apache.org%3E We are looking forward to the community partners can work with @Rubik-W to improve this design. 3:[future]metric metric parameters 4: List task dependency (determination of requirements) @boyi_zhang, @zixi0825, @lidongdai ,@AhahaGe,@ described the demand background of list tasks for us: each data warehouse personnel only cares about their own workflow, in addition, the page explodes, a large number of nodes are displayed in a workflow, and it is difficult to troubleshoot problems 5: performance issues @leonbao did a local run test for our DS Run log: attachment: run log analysis.log https://uploader.shimo.im/f/GKLOERMZKiL2M8Cw.log The environment is local mac, local zookeeper, mysql and related DS master, worker, api services, so the network communication time is not counted 1. api -> master communication 300+ms 2. Create process instance (demolition json) 300ms 3. Build DAG (demolition json) 800-ms 4. Master sends task -> worker receives task instruction 820ms 5. Worker receives task command -> execute command 200+ ms @leonbao believes that we should refactor the master to reduce database polling and thread usage. This is indeed a problem, and we will propose relevant optimization schemes and discuss with you at the third meeting. Welcome friends from the community to actively participate. We are very grateful to the following friends for discussing,They put forward many effective suggestions for this meeting: dailidong, lgcareer, CalvinKirs, Rubik-W, leonbao, zixi0825, AhahaGe, GaoJun, BoYiZhang, samz406, hepaticayu, chenxingchun and more. At the same time, community also hope that more people can participate. Thank you very much. hi,DolphinScheduler 社区:我们在北京时间2020-11-11 19:00针对DolphinScheduler影响性能、扩展性及稳定性进行了讨论,共有20+位伙伴参与了本次会议,会议讨论结果如下: 1:[SPI相关设计以及拆分确定]各个模块提供插件化支持,提高各个模块的扩展性。如注册中心、alert中心(即将结束)、全局队列、日志、任务等。 @gaojun_2048 为我们讲述了SPI的设计以及注意事项: SPI设计保持简洁,不要依赖过多第三方JAR。各个插件保持独立,可能会存在些冗余代码的存在,这种情况下各写各的工具类。 同时非常感谢@baoxin 公司分享他们的实际应用中对于任务插件的改造,它与我们的任务插件SPI不谋而合,DS任务插件化SPI后基于SPI实现shell任务插件,@baoxin 提供的任务插件都通过这个shell任务类型插件来运行。同时基于SPI社区可以实现其它任务类型,比如普通的java任务,http任务,mr任务等等。 此外,@gaojun-2048提到,DS设计尽可能做到简洁,对于用户做痴呆化设计,降低用户理解成本。 2:(TBD)根据sql动态生成DAG的讨论,非常感谢@Rubik-W的设计与建议 ,但是相关设计依然需要完善,比如功能的完备性,通用性。 future邮件讨论:https://lists.apache.org/thread.html/r7dc0647f1e66a688dcfb262cd438efde5367d801ad469d12f5f772ba%40%3Cdev.dolphinscheduler.apache.org%3E 我们很期待社区的小伙伴能和@Rubik-W一起完善这个设计。 3:[future]metric 度量参数 4:列表任务依赖(需求确定) @ BoYiZhang 、@zixi0825_、@lidongdai、@AhahaGe @Rubik-W为我们描述了列表任务产生的需求背景:各个数仓人员只关心自己的工作流,另外,页面爆炸,大量节点显示在一个工作流中,难以排查问题 5:性能问题 @ leonbao 为我们DS做了一次本地运行的测试,提出了以下问题 运行日志:附件: 运行日志剖析.log https://uploader.shimo.im/f/GKLOERMZKiL2M8Cw.log 环境是本地mac,本地起的zookeeper,mysql 和相关的DS master、worker、api服务,所以网络通信的时间没有计算在内 1、api -> master通信 300+ms 2、创建流程实例(拆json) 300ms 3、构建DAG(拆json) 800-ms 4、master发送任务 -> worker收到任务指令 820ms 5、worker接收到任务指令 -> 执行命令 200+ ms @ leonbao 认为我们应该对 master重构,减少数据库轮询和线程使用。这确实是个问题,,我们将在第三次会议提出相关优化方案并和大家讨论。欢迎社区的小伙伴们踊跃参与。 我们很感谢以下朋友的讨论:dailidong、lgcareer、CalvinKirs、Rubik-W、leonbao、zixi0825、 AhahaGe、GaoJun、BoYiZhang、samz406、hepaticayu、chenxingchun等,他们对本次会议提供了很多有效的建议。 同时社区也希望更多的人能够参与进来。非常感谢你们。 Best wishes! CalvinKirs