Just to be clear, the one I mentioned is the stability of the Alert Server, which is a different requirement from the customization of the alert service.When the Alert Server service is up and running, it makes sense to invoke the user's own alerts implemented through the plug-in.I only agree that this should be postponed, or I can make it happen sometime.But I don't agree to lower the stability criteria for DS.
还要说明一点,我提到的这个是alert server的稳定性,和告警服务的定制化是不同的需求。在alert server 服务正常运行的情况下,调用用户自己通过插件实现的告警才是有意义的。我只同意,这个在排期上延后,或者我抽空实现掉。但是我不同意,降低DS的稳定性标准。 [email protected] From: [email protected] Date: 2020-08-24 11:10 To: dev Subject: Re: Re: About the high availability implementation of the Alert service At the very least, support that the Alert service is multi-instance.In this way, the first exception can be notified. Customized alerts can be plugins and implemented by the user, but the alert service is the basis for DS outgoing alerts, and the stability of this service is necessary.No one will accept that the problem with the dispatch platform is that there is no alarm. Also, it doesn't make sense to have a high level of service availability for users to implement on their own. It's an architectural design issue.It's not about customizing requirements.Service stability is a common requirement, not a custom requirement. 那至少要支持alert服务是多实例的。这样出现异常才可以第一时间告知。定制化的告警,可以插件化交给用户自己实现,但是alert 服务是DS向外告警的基础,这个服务的稳定时必要的。谁也不会接受,调度平台出问题是,无法告警。 而且,服务级别的高可用交给用户自己实现,是不合理的这个是架构上设计的问题。不是定制化需求的问题。服务的稳定是一个公共需求,而不是定制化的需求。 [email protected] From: wu shaoj Date: 2020-08-24 10:50 To: [email protected] Subject: Re: About the high availability implementation of the Alert service I don't think the ha of alert is necessary at present or in the future. This extension can be extended by users On 2020/8/23, 10:44, "Yichao Yang" <[email protected]> wrote: Hi, I don't think the ha of alert is necessary at present. This extension can be extended by users. We should focus on the current scheduling. Best, Yichao Yang ------------------ Original ------------------ From: JUN GAO <[email protected]> Date: Sat,Aug 22,2020 9:41 PM To: dev <[email protected]> Subject: Re: About the high availability implementation of the Alert service I think the first one is better. [email protected] <[email protected]>于2020年8月22日 周六19:30写道: > hi ALL > > I would like to make a suggestion that the Alert Module is not currently > designed to be in a high availability state, and that there are problems > with sending repeated alerts when multiple alert services are started. > Alarm service down, DS alarm failure problem. > So far, I've come up with two architectures that address the problem of > sending warning messages repeatedly, while implementing the > high-availability Alert Moduler feature. > > 1、The first is the master-slave relationship between the alert services > through ZK. Only the master node is responsible for sending information. > After the master node is suspended, the master is selected again, and the > new master node continues to provide the warning service. > 2.The second is a de-centralised design in which all alert services work > simultaneously through exclusive locks between them, in which case the > alert messages are not repeated. > > If we have a better plan, we can discuss it together > > Thx > > 中文: > 我提一个建议,目前alert module 设计上还不是高可用状态,存在启动多个alert 服务时,会重复发送告警信息的问题。 > 告警服务挂掉,ds告警功能失效的问题。 > 目前我想到了两种架构来解决重复发送告警信息的问题,同时实现alert moduler高可用功能。 > 1.第一种是alert 服务之间通过zk 实现主从关系,只有主节点来负责信息发送,在主节点挂掉后,重新选主,新的主节点来继续提供告警服务。 > 2.第二种采用去中心的设计,alert 服务 之间通过排它锁来实现所有alert 服务同时工作,并在这种情况下保证告警信息不重复发送。 > 如果大家有更好的方案,可以一起讨论 > > 谢谢 > > > > > [email protected] > -- DolphinScheduler(Incubator) PPMC Jun Gao 高俊 [email protected]
