[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

via GitHub Mon, 04 Sep 2023 19:06:23 -0700


ruanwenjun commented on issue #14832:
URL: 
https://github.com/apache/dolphinscheduler/issues/14832#issuecomment-1705845077


   > > > @ruanwenjun @zixi0825
   > > > 
   > > > 1. Do We need **Global Alarm Types**: That means that the alert 
instance accepts all events by default without specifying a specific workflow.
   > > > 2. overhead: Adding global alarm type will incur extra overhead.  We 
need to query database to see whether there is a global alert instance before 
creating events. For example, when workflow starts,  we query database first 
and if there are any global alert instances, create a WorkflowStart event and 
save it in database .
   > > 
   > > 
   > > I am not clear why we need to `query if there exist xx event before 
workflow start`, if we want to send the workflow start event, we just write a 
event record when workflow start, do you means when we start a workflow twice 
we only send one event? this is unreasonable. In additional of that, this kind 
of workflow is doing by alert server ,this will not overhead.
   > 
   > We query database to check if there exist **global alert instances** 
rather than `query if there exist xx events` before creating events. Events 
will be generated when there exactly exists global alert instances. Taking the 
example of a workflowEnd Event: In the current design, when a workflow is 
initiated, we bind an alert group to the workflow instance. Therefore, when the 
workflow execution ends, we can find the alert strategy and alert group 
directly from `WorkflowInstance`, and if the alert strategy meets the 
requirement and there is an alert group bound to the instance, we create the 
workflowEndEvent. With the addition of global alarm types, global alert 
instances are not bound to workflow instances. So, when workflow execution ends 
(or begins or other scenarios that may generate events), it's necessary to 
query the database to check whether there exists global alert instances. Of 
course, this process can be optimized. For instance, we can query for the alert 
intance only 
 when constructing the `WorkflowExecuteRunnable`, rather than querying the 
database each time an event might be generated.
   
   The event generator doesn't need to care about if there exists an event 
consumer.
   
   > 
   > > > 2. Do we need **more Alarm event types**? Such as 
workflowAddEvent/workflowUpdateEvent/workflowDeleteEvent/workflowStartEvent,etc.
   > > > 3. **Flexibility:** In the current alert module, the title and content 
are determined when master server create an alert. So the format of meassges is 
the same for different alert plugins. Do we need to increase some flexibility? 
similar to KafkaListener, plugins can generate messages in different formats 
and perform different processing logic for different event types.
   > > 
   > > 
   > > In fact the AlertRecord only need to contains some metadata information, 
the content/title or something else should generate by alert 
sender(kafka/email....)
   > 
   > You are right. Current design of sender if flexible enough.
   > 
   > > > 4. **Failed Alert Messages:** Do we consider resending alert messages 
after they failed to send?
   > > >    
   > > >    1. If we resend the failed messages: To ensure that messages sent 
by the same alert instance are in order (e.g., workflowStartEvent should 
precede workflowEndEvent) and do not affect other instances, the current 
message processing method needs to be changed. Each alert instance should 
process its events in chronological order.
   > > 
   > > 
   > > Right now, the alert server will use one loop thread to loop the event, 
this is guaranteed.
   > 
   > The Alert Server only processes PendingAlerts and does not handle 
FailedAlerts, which ensures the order of events. But, if we want to **retry 
FailedAlerts**, things become more complex.
   > 
   > * Out-of-Order Messages: For alert instance `instanceA`, when the 
`process1` start event fails to send, and the `process1` end event succeeds, if 
we retry sending the `process1` start event and it succeeds, it can result in 
out-of-order events.
   > * Availability: If the messages for alert instance `instanceA` 
continuously fail to send and a large number of events accumulate, it can 
impact other alert instances to send messages.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [dolphinscheduler] ruanwenjun commented on issue #14832: [Feature][Listener]Implementation of Listener Mechanism

Reply via email to