LiangDai-Mars opened a new issue, #3998:
URL: https://github.com/apache/amoro/issues/3998

   ### Description
   
   This proposal introduces a new high-availability (HA) service for Amoro 
Management Service (AMS) based on JDBC. This allows AMS to achieve 
primary–standby leader election using a shared relational database, providing 
an alternative to the existing ZooKeeper-based HA mechanism.
   The core characteristics of this feature are:
   - Consistency: Guarantees a single active leader at any time through 
optimistic concurrency control (row-versioning).
   - Recoverability: Ensures automatic failover if the active leader fails. 
Follower nodes can acquire the lease after a configurable time-to-live (TTL) 
expires.
   - Uniqueness: A unique node identifier ensures that each AMS instance is 
distinct within the cluster.
   This feature is introduced in PR 
[#3997](https://github.com/apache/amoro/pull/3997).
   
   
   
   ### Use case/motivation
   
   The existing HA implementation for AMS relies on ZooKeeper. While effective, 
this introduces an external dependency that may not be desirable in all 
deployment environments. For users who already operate a relational database 
(like MySQL or PostgreSQL) for AMS metadata, leveraging the same database for 
HA simplifies the architecture, reduces operational overhead, and lowers the 
total cost of ownership.
   By providing a JDBC-based HA option, Amoro offers greater deployment 
flexibility, allowing users to build a fully self-contained HA cluster without 
needing a separate ZooKeeper ensemble.
   
   ### Describe the solution
   
   Describe the solution
   The solution introduces a new HighAvailabilityContainer interface, with 
JdbcHighAvailabilityContainer as the core implementation for this feature. The 
leader election and failover logic is managed through a dedicated database 
table named ha_lease.
   Functional Flow
   The process operates as follows:
   - Startup: On startup, each AMS node attempts to become the leader.
   - Lease Acquisition: A node tries to acquire leadership by updating a 
designated row in the ha_lease table. This operation is conditional, succeeding 
only if the current lease has expired. The first node to succeed becomes the 
leader.
   - Heartbeat and Lease Renewal: The active leader periodically sends a 
heartbeat to the database to renew its lease. This is an optimistic-locking 
update that increments a version number and extends the lease_expire_ts (lease 
expiration timestamp).
   - Demotion and Failover: If the leader fails to renew its lease (e.g., due 
to a crash or network partition), its lease expires after the configured TTL. 
Other follower nodes, which are continuously attempting to acquire the lease, 
will eventually succeed, and one will be promoted to leader. The old leader, if 
it recovers, will be demoted to a follower.
   - Server Info Updates: Upon gaining leadership, the active node writes its 
connection information (IP address and ports for different services) to the 
ha_lease table, ensuring clients can discover the active AMS instance.
   Compatibility
   - Existing HA: The new implementation is fully compatible with the existing 
HA framework. The choice between jdbc and zk is determined by the ha.type 
configuration property. If HA is disabled (ha.enable=false), a 
NoopHighAvailabilityContainer is used, and the system runs as a standalone node.
   - AMS Components: The HA logic is encapsulated within the amoro-ams module 
and integrates seamlessly with the Amoro service startup container.
   - Database Support: The ha_lease table schema is compatible with Derby, 
MySQL, and PostgreSQL. Initialization scripts are provided in the resources to 
create the required table and indexes.
   
   ### Subtasks
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to