HxpSerein commented on issue #3905:
URL: 
https://github.com/apache/incubator-streampark/issues/3905#issuecomment-2257530743

   # Further Design Proposal
   
   ## Motivation
   The further design proposal is to implement StreamPark HA **in two stages**. 
   
   The first stage will involve making the **least changes to console** and 
focusing on designing and implementing the necessary **registry center**, 
**resource center** and **job distribution algorithm** to achieve a basic 
deliverable high-availability version. This stage aims to quickly provide a 
functional high-availability system with minimal disruption to the current 
setup.
   
   The second stage will further **refactor and split** the console into 
**master** and **worker** components to achieve a complete high-availability 
version. This stage will focus on optimizing performance and scalability, 
ensuring the system can handle larger workloads and more complex operations.
   
   This proposal currently focuses only on the Basic StreamPark HA Architecture.
   
   ## Basic StreamPark HA Architecture
   
![image](https://github.com/user-attachments/assets/38e0804d-9e4c-4739-bb76-42a25ea36676)
   
   ### Registry Center
   The registry center is responsible for the registration and discovery of 
servers. It ensures that servers can work together efficiently and correctly 
and improves the overall reliability of the system.
   
   ### Resource Center
   The Resource Center provides storage for jar packages and necessary 
resources. The startup of a job no longer depends on a specific server but can 
occur on any server. The new server can obtain jar packages from the resource 
center to achieve high availability and disaster tolerance. This design ensures 
that jobs can be started and managed from any server, reducing single points of 
failure.
   
   ### Job Distribution
   A consistent hashing algorithm will be used for job distribution and job 
migration in case of cluster expansion and disaster. In the basic 
high-availability version, there is no additional communication between 
servers. Servers receive tasks by polling the job distribution table. This 
method ensures even distribution of jobs and efficient handling of server 
failures or additions.
   
   **_Note_**: Remember to add the documentation at the end, including detailed 
descriptions of the registry center, resource center, job distribution 
algorithm, and overall architecture.
   
   ### Compatibility, Deprecation, and Migration Plan
   Add upgrade and deployment scripts for users. Ensure these scripts are 
well-documented and easy to follow, allowing users to transition smoothly to 
the new high-availability architecture. Include clear instructions for 
migrating existing jobs and resources.
   
   ### Test Plan
   Add end-to-end (e2e) tests to ensure stability. These tests should cover:
   - Server registration and discovery
   - Resource retrieval and job startup from the resource center
   - Job distribution and migration under various scenarios, including server 
failures and cluster expansions
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to