hi simply implement +1 By the way, I think the HA of Alert service is not very important compared with Master/Worker Server, master-standby alert services could be implemented in the future when needed
Best Regards --------------- DolphinScheduler(Incubator) PPMC Lidong Dai 代立冬 [email protected] --------------- On Tue, Sep 29, 2020 at 12:01 PM felix <[email protected]> wrote: > I want to simply implement the logic of adding an exclusive lock when the > Alert service queries the database, so that even if two alert services are > started, there is no problem with multiple alerts being sent. > > > 我想简单实现掉,在alert服务查询数据库时加一个排它锁的逻辑,这样即使启动两个alert服务也不会造成告警重复发送的问题。 > > > 原始邮件 > 发件人: [email protected]<[email protected]> > 收件人: dev<[email protected]> > 发送时间: 2020年8月24日(周一) 12:50 > 主题: Re: Re: About the high availability implementation of the Alert service > > > Can a single instance achieve high availability? > Of course, there are many ways to implement high availability > > > 单实例可以实现高可用吗? > 当然高可用的实现方式有多种 > > > [email protected] > > From: wu shaoj > Date: 2020-08-24 12:36 > To: [email protected] > Subject: Re: About the high availability implementation of the Alert > service > I think there's no relationship between stability and the multi-instance > at all. > > > On 2020/8/24, 11:17, "[email protected]" <[email protected]> > wrote: > > Just to be clear, the one I mentioned is the stability of the Alert > Server, which is a different requirement from the customization of the > alert service.When the Alert Server service is up and running, it makes > sense to invoke the user's own alerts implemented through the plug-in.I > only agree that this should be postponed, or I can make it happen > sometime.But I don't agree to lower the stability criteria for DS. > > 还要说明一点,我提到的这个是alert server的稳定性,和告警服务的定制化是不同的需求。在alert server > 服务正常运行的情况下,调用用户自己通过插件实现的告警才是有意义的。我只同意,这个在排期上延后,或者我抽空实现掉。但是我不同意,降低DS的稳定性标准。 > > > > [email protected] > > From: [email protected] > Date: 2020-08-24 11:10 > To: dev > Subject: Re: Re: About the high availability implementation of the > Alert service > > At the very least, support that the Alert service is multi-instance.In > this way, the first exception can be notified. > Customized alerts can be plugins and implemented by the user, but the > alert service is the basis for DS outgoing alerts, and the stability of > this service is necessary.No one will accept that the problem with the > dispatch platform is that there is no alarm. > Also, it doesn't make sense to have a high level of service > availability for users to implement on their own. It's an architectural > design issue.It's not about customizing requirements.Service stability is a > common requirement, not a custom requirement. > > > 那至少要支持alert服务是多实例的。这样出现异常才可以第一时间告知。定制化的告警,可以插件化交给用户自己实现,但是alert > 服务是DS向外告警的基础,这个服务的稳定时必要的。谁也不会接受,调度平台出问题是,无法告警。 > 而且,服务级别的高可用交给用户自己实现,是不合理的这个是架构上设计的问题。不是定制化需求的问题。服务的稳定是一个公共需求,而不是定制化的需求。 > > > > > [email protected] > From: wu shaoj > Date: 2020-08-24 10:50 > To: [email protected] > Subject: Re: About the high availability implementation of the Alert > service > I don't think the ha of alert is necessary at present or in the > future. This extension can be extended by users > On 2020/8/23, 10:44, "Yichao Yang" <[email protected]> wrote: > Hi, > I don't think the ha of alert is necessary at present. This > extension can be extended by users. We should focus on the current > scheduling. > Best, > Yichao Yang > ------------------ Original ------------------ > From: JUN GAO <[email protected]> > Date: Sat,Aug 22,2020 9:41 PM > To: dev <[email protected]> > Subject: Re: About the high availability implementation of the > Alert service > I think the first one is better. > [email protected] <[email protected]>于2020年8月22日 > 周六19:30写道: > > hi ALL > > > > I would like to make a suggestion that the Alert Module is > not currently > > designed to be in a high availability state, and that there > are problems > > with sending repeated alerts when multiple alert services are > started. > > Alarm service down, DS alarm failure problem. > > So far, I've come up with two architectures that address the > problem of > > sending warning messages repeatedly, while implementing the > > high-availability Alert Moduler feature. > > > > 1、The first is the master-slave relationship between the > alert services > > through ZK. Only the master node is responsible for sending > information. > > After the master node is suspended, the master is selected > again, and the > > new master node continues to provide the warning service. > > 2.The second is a de-centralised design in which all alert > services work > > simultaneously through exclusive locks between them, in which > case the > > alert messages are not repeated. > > > > If we have a better plan, we can discuss it together > > > > Thx > > > > 中文: > > 我提一个建议,目前alert module 设计上还不是高可用状态,存在启动多个alert > 服务时,会重复发送告警信息的问题。 > > 告警服务挂掉,ds告警功能失效的问题。 > > 目前我想到了两种架构来解决重复发送告警信息的问题,同时实现alert moduler高可用功能。 > > 1.第一种是alert 服务之间通过zk > 实现主从关系,只有主节点来负责信息发送,在主节点挂掉后,重新选主,新的主节点来继续提供告警服务。 > > 2.第二种采用去中心的设计,alert 服务 之间通过排它锁来实现所有alert > 服务同时工作,并在这种情况下保证告警信息不重复发送。 > > 如果大家有更好的方案,可以一起讨论 > > > > 谢谢 > > > > > > > > > > [email protected] > > > -- > DolphinScheduler(Incubator) PPMC > Jun Gao 高俊 > [email protected]
