Re: [I] [SPIP-1] Suggest add HA for StreamPark [incubator-streampark]

via GitHub Wed, 31 Jul 2024 05:29:55 -0700


HxpSerein commented on issue #3905:
URL: 
https://github.com/apache/incubator-streampark/issues/3905#issuecomment-2260409378


   >In general, database has distributed consistency and high availability as 
well.
   >
   >As I understand, StreamPark relies heavily on the database. All user 
information and job information are stored in the database, so the StreamPark 
cannot work after database is crashed even if the registry center is using 
zookeeper.
   >
   >If we introduced the zookeeper, StreamPark will be unavailable whenever 
either Zookeeper or database crashes. This increases the maintenance cost for 
users and makes StreamPark more likely to be unavailable.
   
   If the database has distributed consistency and high availability, it is 
indeed simpler to use a database.
   
   >As my example mentioned before, all operations of same key should be 
forwarded to the same server for database system. But I still don't understand 
why same job should be monitored in the same server?
   >
   >Or my question is: if job is monitored by one random server, does it works? 
If yes, there is no need to introduce complex consistent hashing.
   
   First, I agree with @SbloodyS 's viewpoint that only job monitoring needs to 
consider distribution among servers.
   
   Second, I believe that consistent hashing does not introduce much additional 
complexity to the architecture. When job monitoring need to be migrated and 
allocated, we can simply invoke the algorithm to provide the allocation plan. 
This algorithm could be consistent hashing, greedy, or random. Regardless of 
the algorithm, our framework only needs to use a common interface for 
invocation.
   
   Finally, I believe that the primary goal should be to implement the overall 
distributed framework. The registry center and allocation algorithms are 
options that can be modified within the framework. Considering the workload and 
complexity, it is feasible to first implement a registry center using a 
database. Once the framework is in place, implementing the allocation 
algorithms will be relatively straightforward, and both greedy and consistent 
hashing algorithms can be considered.
   
   Please correct me if any understanding is wrong, thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SPIP-1] Suggest add HA for StreamPark [incubator-streampark]

Reply via email to