Re: [DISCUSS]Which part of DolphinScheduler can be more scalable?

leon bao Tue, 03 Nov 2020 06:55:50 -0800

Maybe it didn't express my meaning well.
Yes, DolphinScheduler doesn't need to do this storage. We just need to do
the log plug-in function to support users to implement their own log
storage mode, just like we do alert plug-ins.



可能是没有表达好的我的意思。
是的，DolphinScheduler没必要去做这个存储，我们只需要做日志插件功能，支持用户实现自己的日志存储方式，就像我们做告警插件一样。


boyi <[email protected]> 于2020年11月3日周二 下午10:37写道：

> hi:
>
>
> Agree with Calvin KIRs
>
>
>
>
>
>
> Generally, there will be a unified log collection for such files, such as
> ELK. Especially when deployed in the docker environment, we can consider
> focusing on more important things
>
>
> --------------------------------------
> BoYi ZhangE-mail : [email protected]
> On 11/3/2020 22:23，CalvinKirs<[email protected]> wrote：
> I understand what you mean, but what I want to express is whether it would
> be better for users to do this part of the work. Generally, companies have
> their own log collection system. If we were to do this part of the storage,
> the workload would be great, and the benefits would not be great.
>
>
>
> 我明白您的意思，但我想表达的是，这部分工作交给用户来做会不会更好一点，一般企业都会有自己的日志收集系统。如果我们来做这部分存储的话工作量会很大，并且收益并不是很可观。
>
>
> Best  wishes！
> CalvinKirs
>
>
> On 11/3/2020 22:09，leon bao<[email protected]> wrote：
> @CalvinKirs
>
> about log spi , now have a requirement in scalable
> services(master/worker), this kind of application scenario requires that
> the task log cannot be stored in the worker / master, but needs to be
> stored in a third-party place, which maybe database or other
> storage. Therefore, if the dolphin scheduler can provide this plug-in
> function, users can read and write logs according to their own needs.
>
>
> 关于日志插件，我们现在有一个需求是可伸缩的服务(master/worker)，这种应用场景就要求任务的日志不能存在某一个worker/master上，
>
> 而是需要存在一个第三方的地方，可能是数据库或者其他存储。所以如果DolphinScheduler能提供这个插件功能，用户就可以根据自己需求来实现日志的读写.
>
> CalvinKirs <[email protected]> 于2020年11月3日周二 下午6:40写道：
>
>
>
> Great planning!
> But I have a little question, what are the specific requirements of the
> log SPI? I am not very clear at present, are we only implementing SPI for
> data storage? If this is the case, is it necessary? I think this user can
> use logagent (or other technologies) for related implementations. Different
> users have different needs. Some users may also involve aggregation,
> calculation, and even different magnitudes, and may use additional
> components. Therefore, if we store the original data in this piece, a lot
> of redundant data may be generated.
>
>
> 非常棒的规划！
>
> 但是我有一点疑问，日志SPI这块的具体需求是什么？我目前不是很明确，我们是只对数据存储做SPI实现吗？如果是这样的话，是否有必要呢？我认为这块用户可以自己使用logagent（或者其他技术）
>
> 来进行相关的实现，不同用户的需求不同，有的用户可能还牵扯到聚合、计算，甚至量级不同，还有可能使用额外的组件。因此，这块如果我们来对原始数据做存储的话可能会产生很多冗余数据。
>
>
> Best  wishes！
> CalvinKirs
>
>
> On 11/3/2020 11:47，leon bao<[email protected]> wrote：
> Hello Everyone:
>
> DS has good horizontal scalability with its non central design
> architecture, which attracts many developers. With more and more users, the
> demand for scheduling is becoming more and more complex.
> At the same time, the functional design of DS is required to be more
> scalable，for example: the plug-in function of alarm mode.
> So  we can discuss what parts of plug-ins DS can do at present. We can
> reconstruct DolphinScheduler according to the results of this discussion.
> At present, there are several parts of demand:
>
> - alert model:
> refer to:
> https://github.com/apache/incubator-dolphinscheduler/issues/3049
>
> - task plugin:
> refer to:
> https://github.com/apache/incubator-dolphinscheduler/issues/2869
>
> - register center:
> refer to:
>
>
>
> https://lists.apache.org/thread.html/r755a57e3b859563de2dddf8aa2f336fcf28934e7bbb2c3f97fe5fe3d%40%3Cdev.dolphinscheduler.apache.org%3E
> https://github.com/apache/incubator-dolphinscheduler/issues/3961
>
> - log model:
> The current log is recorded by writing local files of the server.
> Can we make this plug-in type, which can facilitate users to extend the
> log reading and writing types, such as writing to the database or other
> third-party systems.
>
> - global task queue
> At present, tasks are stored in the memory queue of the master, which
> results in the priority of a task can only work within the scope of a
> master.
> In order to make the priority of a task effective globally, we need a
> global queue to make the global priority work.
> (in version 1.2, we used zookeeper as the global queue, which was removed
> because of the delay of ZK operation)
>
> Implementation details can be discussed within each topic. Here, we only
> discuss the requirements.
> Very appreciate you can put forward more opinions.
>
>
>
> ==================================================================================================================================================================
>
>
>
> DS目前以无中心的设计架构具备了很好的横向扩展性，这个特性吸引了很多的开发者。随着DS用户越来越多，对调度的需求越来越复杂，同时也要求DS在功能设计上要更具有可扩展性
> 比如告警方式的插件功能，所以在这里大家可以讨论目前DS可以做哪些部分的插件，后续我们可以根据这个讨论结果，来对DS进行插件方面的重构。
> 目前已经有需求的几个部分：
>
> - 告警插件(running)
> 相关讨论：
> https://github.com/apache/incubator-dolphinscheduler/issues/3049
>
> - 任务插件
> https://github.com/apache/incubator-dolphinscheduler/issues/2869
>
> - 注册中心
> 相关讨论：
>
>
>
> https://lists.apache.org/thread.html/r755a57e3b859563de2dddf8aa2f336fcf28934e7bbb2c3f97fe5fe3d%40%3Cdev.dolphinscheduler.apache.org%3E
> https://github.com/apache/incubator-dolphinscheduler/issues/3961
>
> - 日志插件
> 目前的日志是通过写服务器本地文件的形式记录的，是不是可以把这个做成插件类型，方便用户扩展日志读写类型，比如写到数据库或者其他第三方系统中。
>
> - 全局队列插件
>
>
>
> 目前任务是被存储在master的内存队列，这就导致了任务的优先级只能在一定范围内起作用，为了让任务的优先级在全局有效，我们需要一种全局队列来让全局优先级起作用。(比如1.2版本我们使用的zookeeper作为全局队列,后面因为zk操作的延时性我们去掉了这个)。
>
> 实现细节可以在每个话题内部进行讨论，在这里我们只讨论需求，希望大家可以提出更多意见。
>
> --
> DolphinScheduler(Incubator)  PPMC
> BaoLiang 鲍亮
> [email protected]
>
>
>
> --
> DolphinScheduler(Incubator)  PPMC
> BaoLiang 鲍亮
> [email protected]
>


-- 
DolphinScheduler(Incubator)  PPMC
BaoLiang 鲍亮
[email protected]

Re: [DISCUSS]Which part of DolphinScheduler can be more scalable?

Reply via email to