[GitHub] [incubator-pegasus] empiredan commented on issue #865: Feature: support to increase the replication factor of each table

GitBox Thu, 02 Jun 2022 07:05:05 -0700


empiredan commented on issue #865:
URL: 
https://github.com/apache/incubator-pegasus/issues/865#issuecomment-1144905936


   > > > > > > Good enhancement~
   > > > > > > I have some questions about updating table replica count:
   > > > > > > 
   > > > > > > 1. There're also cluster-level config called 
`max_replica_count`, will it be confict if cluster `max_replica_count` is not 
equal to table `max_replica_count`? Will important features work such as user 
read/write, learn, load balance, rolling_update?
   > > > > > > 2. Table `app_info` file will store in each replica disk 
(_dir/.app_info), if `max_replica_count` is updated, this file should also be 
updated, how to update it?
   > > > > > > 3. In your description, `cure` (executed each 10s) will delete 
redunant replica or add lacking replica. Could you please add some manual test 
results?
   > > > > > > 
   > > > > > > Expecting your reply~
   > > > > > 
   > > > > > 
   > > > > > Ok, let's discuss one by one:
   > > > > > 
   > > > > > 1. Does cluster-level config mean `max_allowed_replica_count` in 
[feat: restrict the replication factor while creating app 
XiaoMi/rdsn#963](https://github.com/XiaoMi/rdsn/pull/963) ? Actually 
`max_replica_count` is the RF, and `max_allowed_replica_count` is a the upper 
bound to which an RF can be set. Or we should rename `max_replica_count` as 
replication_factor to avoid misunderstanding ?
   > > > > > 2. Good question ! This implementation has't written into 
`.app_info`, this is a bug ! To guarantee the atomicity, I think first the 
request `set_replica_count` can be replicated to each replica; Received the 
request, each replica writes it into `slog`, and then writes the new 
`max_replica_count` into `.app_info`. Even if a replica server fails during the 
process, the request will not be lost.
   > > > > > 3. As is described in this issue, increasing `max_replica_count` 
has been tested sufficiently, however decreasing `max_replica_count` has not 
been tested. I'll test decreasing later to see if redundant replica can be 
processed correctly.
   > > > > 
   > > > > 
   > > > > Thanks for your reply~ Continue to discuss:
   > > > > 
   > > > > 1. I mean meta option called `max_replicas_in_group` , you can 
reference it here:
   > > > >    
https://github.com/XiaoMi/rdsn/blob/289eb4609ae781e3bcc6bbc6bfeae64dfd8fa785/src/meta/meta_options.cpp#L160-L161
   > > > > 2. I suggest that you can update `max_replica_count` through 
`on_config_sync` into `.app_info`
   > > > > 3. I think decreasing replica count may have some problems, 
expecting your test result~
   > > > 
   > > > 
   > > > Thanks for suggestion !
   > > > 
   > > > 1. I think we can eliminate `max_replicas_in_group` and turn to 
partition-level `max_replica_count`. I'll explain this as below:
   > > > 
   > > > Now `max_replicas_in_group` is only used to update 
`config_context::MAX_REPLICA_COUNT_IN_GRROUP`; And 
`config_context::MAX_REPLICA_COUNT_IN_GRROUP` is only used in:
   > > > ```c++
   > > > void config_context::check_size()
   > > > {
   > > >     // when add learner, it is possible that replica_count > 
max_replica_count, so we
   > > >     // need to remove things from dropped only when it's not empty.
   > > >     while (replica_count(*config_owner) + dropped.size() > 
MAX_REPLICA_COUNT_IN_GRROUP &&
   > > >            !dropped.empty()) {
   > > >         dropped.erase(dropped.begin());
   > > >         prefered_dropped = (int)dropped.size() - 1;
   > > >     }
   > > > }
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > However, `MAX_REPLICA_COUNT_IN_GRROUP` can be replaced with 
partition-level `max_replica_count` as below:
   > > > ```c++
   > > > void config_context::check_size()
   > > > {
   > > >     // when add learner, it is possible that replica_count > 
max_replica_count, so we
   > > >     // need to remove things from dropped only when it's not empty.
   > > >     while (replica_count(*config_owner) + dropped.size() > 
config_owner->max_replica_count &&
   > > >            !dropped.empty()) {
   > > >         dropped.erase(dropped.begin());
   > > >         prefered_dropped = (int)dropped.size() - 1;
   > > >     }
   > > > }
   > > > ```
   > > > 
   > > > 
   > > >     
   > > >       
   > > >     
   > > > 
   > > >       
   > > >     
   > > > 
   > > >     
   > > >   
   > > > The `partition_configuration` pointed by `config_owner` is updated in 
`server_state::on_update_configuration_on_remote_reply`, which will be called 
by `set_replica_count`.
   > > > 
   > > > 2. Good idea ! Each `config_sync_interval_ms` (typically 30 ms) the 
replica server requests `query_configuration_by_node` to the meta server; 
Received the response, if the ballot of the response is newer than the local, 
the replica server will update every table-level and partition-level 
`max_replica_count` to the one of the response. Is it right ?
   > > > 3. Actually decreasing replica count has been forbidden till now by 
us. Decreasing is more error-prone than increasing. If we decide to support 
decreasing replica count, I'll test sufficiently.
   > > 
   > > 
   > > Thanks for your reply~
   > > 
   > > 1. If you have already checked that table level replica_count can 
replace cluster `max_replicas_in_group`, just go ahead~
   > > 2. Yes, I recommend update `.app_info` file through `on_config_sync` rpc
   > > 3. Okay. How about update this issue title from `update the replication 
factor` into `increase the replication factor`? As the decreasing is not 
supported.
   > 
   > Ok, we can first support increasing the replication factor. Decreasing can 
be supported later if needed.
   
   Now our implementation has supported to decrease the replication factor. 
   
   It is noticeable that the redundant replica data will be garbage-collected 
only when meta level is lively, which means `set_meta_level lively` should be 
executed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-pegasus] empiredan commented on issue #865: Feature: support to increase the replication factor of each table

Reply via email to