GitHub user shrihari7396 created a discussion: Design Discussion: Embedding 
AlertServer into dolphinscheduler-api Module

Hi all,

I’ve been studying the architectural requirements for embedding the AlertServer 
into the API Server (related to #8975). After reviewing the initialization 
flows in `dolphinscheduler-alert-server` and `dolphinscheduler-api`, I’d like 
to discuss a potential design direction and gather feedback.

My goal is to transition the alerting mechanism from a standalone process to an 
embedded background service while maintaining DolphinScheduler's 
high-availability and reliability standards.

---

## Proposed Technical Direction

### 1. Logic Decoupling
Refactor core logic (e.g., `AlertBootstrapService`, `AlertSender`) into a 
reusable library module that can be natively consumed by `dolphinscheduler-api`.

### 2. Lifecycle Integration
Use Spring-managed components and lifecycle hooks (`@PostConstruct`) to 
initialize the alerting engine within the API Server process only after it 
successfully joins the Registry.

### 3. Leader Election & HA
To prevent duplicate alert processing in horizontally scaled API deployments, 
leverage the existing `RegistryClient` (ZooKeeper/Etcd abstraction) to 
implement a leader-follower model for the embedded alerting loop.

### 4. Fault Tolerance & Atomicity
- Implement an atomic claim mechanism using SQL-based optimistic locking 
(updating rows to a `SENDING` state with an `instance_id` before processing).
- Introduce a "Janitor" thread on the leader instance to identify and re-queue 
alerts stuck in a `SENDING` state due to unexpected API server crashes.

### 5. Performance Isolation
Isolate alerting execution within a dedicated thread pool to ensure that 
long-running notification tasks do not impact the responsiveness of the REST 
API or UI.

### 6. SPI & Cleanup
Ensure the API Server configuration can dynamically load Alert SPI plugins, 
while decommissioning standalone startup scripts, assembly descriptors, and 
Docker/K8s definitions for the separate Alert component.

---

I would appreciate any feedback or concerns regarding this approach, 
particularly on the distributed coordination strategy, before proceeding 
further with implementation planning.

Best regards,  
**Shrihari Rajendrakumar Kulkarni**

GitHub link: https://github.com/apache/dolphinscheduler/discussions/18005

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to