HxpSerein commented on issue #3905: URL: https://github.com/apache/incubator-streampark/issues/3905#issuecomment-2257530743
# Further Design Proposal ## Motivation The further design proposal is to implement StreamPark HA **in two stages**. The first stage will involve making the **least changes to console** and focusing on designing and implementing the necessary **registry center**, **resource center** and **job distribution algorithm** to achieve a basic deliverable high-availability version. This stage aims to quickly provide a functional high-availability system with minimal disruption to the current setup. The second stage will further **refactor and split** the console into **master** and **worker** components to achieve a complete high-availability version. This stage will focus on optimizing performance and scalability, ensuring the system can handle larger workloads and more complex operations. This proposal currently focuses only on the Basic StreamPark HA Architecture. ## Basic StreamPark HA Architecture  ### Registry Center The registry center is responsible for the registration and discovery of servers. It ensures that servers can work together efficiently and correctly and improves the overall reliability of the system. ### Resource Center The Resource Center provides storage for jar packages and necessary resources. The startup of a job no longer depends on a specific server but can occur on any server. The new server can obtain jar packages from the resource center to achieve high availability and disaster tolerance. This design ensures that jobs can be started and managed from any server, reducing single points of failure. ### Job Distribution A consistent hashing algorithm will be used for job distribution and job migration in case of cluster expansion and disaster. In the basic high-availability version, there is no additional communication between servers. Servers receive tasks by polling the job distribution table. This method ensures even distribution of jobs and efficient handling of server failures or additions. **_Note_**: Remember to add the documentation at the end, including detailed descriptions of the registry center, resource center, job distribution algorithm, and overall architecture. ### Compatibility, Deprecation, and Migration Plan Add upgrade and deployment scripts for users. Ensure these scripts are well-documented and easy to follow, allowing users to transition smoothly to the new high-availability architecture. Include clear instructions for migrating existing jobs and resources. ### Test Plan Add end-to-end (e2e) tests to ensure stability. These tests should cover: - Server registration and discovery - Resource retrieval and job startup from the resource center - Job distribution and migration under various scenarios, including server failures and cluster expansions ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
