czeming opened a new pull request, #9371:
URL: https://github.com/apache/dolphinscheduler/pull/9371

   <!--Thanks very much for contributing to Apache DolphinScheduler. Please 
review 
https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html
 before opening a pull request.-->
   
   
   ## Purpose of the pull request
   
   This pr will close https://github.com/apache/dolphinscheduler/issues/9174.
   1. Add range for alarm suppression.
   2. Optimize query logic.
   
   see 
https://github.com/apache/dolphinscheduler/commit/f8ecb536b71d6f33b71c73930832b62890b84ea1
   
   ## Supplement
   
   I read commit 
`https://github.com/apache/dolphinscheduler/commit/f8ecb536b71d6f33b71c73930832b62890b84ea1`
 and class `org.apache.dolphinscheduler.dao.AlertDao`.
   
   It modified method 
`org.apache.dolphinscheduler.dao.AlertDao#sendServerStopedAlert`, but other 
logics have not changed. I speculate that the scene this friend is facing is 
just to reduce the frequency of redundant machine alarms in a time period. 
Judging by whether the content is equal is the actual demand. I think adding 
message ID to judge by message ID will lead to the invalidation of the logic he 
added and make this judgment meaningless, because it is difficult for multiple 
redundant machine actions of the same host to occur at the same time point (the 
ID will obviously be different).
   
   In addition, if my understanding is correct, there is a hidden danger in the 
submission itself. The following logic will cause the same host crash not to be 
notified in the future.
   
   ```
   where content = #{alert.content} and alert_status = #{alert.alertStatus.code}
   ```
   and
   ```
   ServerAlertContent serverStopAlertContent = ServerAlertContent.newBuilder()
           .type(serverType)
           .host(host)
           .event(AlertEvent.SERVER_DOWN)
           .warningLevel(AlertWarnLevel.SERIOUS).
           build();
   String content = 
JSONUtils.toJsonString(Lists.newArrayList(serverStopAlertContent));
   ```
   
   To sum up, the needs of both parties are:
   1. The redundant machine will not be notified repeatedly for a period of time
   2. Improve processing efficiency
   
   A low-cost change can be to change the judgment condition of SQL to
   ```
   where content = #{alert.content} and alert_status = 
#{alert.alertStatus.code} and create_time >= ${No alarm start time}
   ```
   
   However, if table `t_ds_alert` is used for more alarms in the future, its 
data volume may be more than now. Even if field `create_time` is used as a 
condition, there is still room for optimization when using field `content` to 
make an equal judgment.
   
   We can consider adding a field to calculate the content and other 
information combinations representing the message content to calculate the 
signature information, and use this field as the judgment basis.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to