czeming opened a new pull request, #9371: URL: https://github.com/apache/dolphinscheduler/pull/9371
<!--Thanks very much for contributing to Apache DolphinScheduler. Please review https://dolphinscheduler.apache.org/en-us/community/development/pull-request.html before opening a pull request.--> ## Purpose of the pull request This pr will close https://github.com/apache/dolphinscheduler/issues/9174. 1. Add range for alarm suppression. 2. Optimize query logic. see https://github.com/apache/dolphinscheduler/commit/f8ecb536b71d6f33b71c73930832b62890b84ea1 ## Supplement I read commit `https://github.com/apache/dolphinscheduler/commit/f8ecb536b71d6f33b71c73930832b62890b84ea1` and class `org.apache.dolphinscheduler.dao.AlertDao`. It modified method `org.apache.dolphinscheduler.dao.AlertDao#sendServerStopedAlert`, but other logics have not changed. I speculate that the scene this friend is facing is just to reduce the frequency of redundant machine alarms in a time period. Judging by whether the content is equal is the actual demand. I think adding message ID to judge by message ID will lead to the invalidation of the logic he added and make this judgment meaningless, because it is difficult for multiple redundant machine actions of the same host to occur at the same time point (the ID will obviously be different). In addition, if my understanding is correct, there is a hidden danger in the submission itself. The following logic will cause the same host crash not to be notified in the future. ``` where content = #{alert.content} and alert_status = #{alert.alertStatus.code} ``` and ``` ServerAlertContent serverStopAlertContent = ServerAlertContent.newBuilder() .type(serverType) .host(host) .event(AlertEvent.SERVER_DOWN) .warningLevel(AlertWarnLevel.SERIOUS). build(); String content = JSONUtils.toJsonString(Lists.newArrayList(serverStopAlertContent)); ``` To sum up, the needs of both parties are: 1. The redundant machine will not be notified repeatedly for a period of time 2. Improve processing efficiency A low-cost change can be to change the judgment condition of SQL to ``` where content = #{alert.content} and alert_status = #{alert.alertStatus.code} and create_time >= ${No alarm start time} ``` However, if table `t_ds_alert` is used for more alarms in the future, its data volume may be more than now. Even if field `create_time` is used as a condition, there is still room for optimization when using field `content` to make an equal judgment. We can consider adding a field to calculate the content and other information combinations representing the message content to calculate the signature information, and use this field as the judgment basis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
