Jackie-Jiang commented on PR #8441:
URL: https://github.com/apache/pinot/pull/8441#issuecomment-1088071346

   > Yes I agree that ideally we want to rely on the determinism of selection 
algorithm to deduce the vacant and pool id of down servers, without preserving 
any state. But IMO storing RG -> pool mapping is still not enough, take the 
following case as an example of a different assignment strategy not requiring 
1:N pool to replica mapping:
   > 
   > 24 instances in 5 pools, with [5,5,5,4,5] servers in each pool, ie [s0-s4, 
s5-s9, s10-s14, s15-18, s19-s23] we want to assign this 21 instances to 3 RGs, 
a instance selection strategy gives: RG0: [s0-s7] RG1: [s8-s15] RG2: [s16-s23]
   > 
   > this yields a RG->pool mapping of: RG0 -> pool0, pool1 RG2 -> pool1, 
pool2, pool3 RG3 -> pool3, pool4
   > 
   > taking out 1 instances s6(poo1), s11(pool2), and s21(pool4), it will be 
[5,4,4,4,4] severs in each pool and [7, 7, 7] servers in each RG. Then for 
InstanceTagPoolSelector/PartitionSelector it would be very hard to figure out 
(1) to which pool the lost instances belong to (2) where the vacant seats in 
the each pool
   
   I assume this is for the fault domain assignment which is not yet supported. 
Handling this scenario with the constrain of fault domain (no server cross 
replicas in the same fault domain) can be quite hard, and the current 
implementation won't cover that as well.
   
   I wouldn't mix the problem this PR is trying to solve with the fault domain 
support. Essentially what we want this PR to achieve is to minimize the server 
shift when picking servers from the candidate servers. That should be handled 
within the server selection step. Even with fault domain assignment introduced, 
when the fault domain pool selection algorithm is deterministic, the simple 
algorithm should give good enough result (when one server is replaced, worst 
case we need to shift segments from one existing server to another if it is 
assigned to a different replica group), instead of the worst case of shifting 
all segments from all servers currently.
   
   In general, we should not try to solve the server selection problem in the 
pool selection step because that will introduce dependency among different 
steps, and cause them not to work with other algorithms.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to