kfaraz commented on issue #18764:
URL: https://github.com/apache/druid/issues/18764#issuecomment-3557691967

   @jtuglu1 , thanks for reporting this!
   
   IIUC, there are two parts to the problem.
   
   ### A
   
   > The issue is that there is a momentary lapse in time between when server B 
loads the segment (and it 
[appears](https://github.com/apache/druid/blob/-/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentReplicaCountMap.java#L54)
 as loaded in accounting) and when the callback for dropping on server A occurs.
   
   I think this may be true but more from the Broker's perspective i.e. it is 
possible that the Broker sees the drop from A and then the load on B, causing 
the segment to be unavailable for a brief period. Since the Coordinator is not 
aware of the inventory of the individual Brokers, there isn't much we can do 
here.
   
   Generally, the Coordinator's own inventory is meant to serve as a proxy for 
what the Brokers see.
   And if Coordinator's inventory does not have S on B, then Coordinator 
wouldn't decide to remove the segment from B anyway. On the contrary, it might 
try to load S somewhere (A or B or maybe even C) leading to over-replication in 
the next cycle and then another drop from somewhere.
   
   But I assume this is not the case discussed here.
   
   #### Solution
   
   Even though this is not the bug reported in this ticket, there is room for 
improvement (or atleast an alternate strategy).
   
   We could delay the drop from A until Coordinator is sure that B has loaded S 
(this would save the extra work which sometimes ends up nullifying the move 
from A to B by reloading the segment on A 😅 ).
   - An easy way to do that would be to leave S in `segmentsMarkedToDrop` of 
peon A and do nothing in the callback of `SegmentLoadQueueManager`.
   - When S shows up on B and we recognize it as over-replicated (in a later 
duty run), we prioritize drop of the extra S from A (since it is already marked 
to drop from A).
   - We might also need to tweak the calculation of replicas so that 
`movingFrom` segments also count towards over-replication.
   - I suppose the drawback is that segment balancing would now seem to be slow 
as the segments would stick around on A until the next coordinator cycle 
(typically 1 minute).
   
   ### B
   
   > Because the projectedReplicas (calculated 
[here](https://github.com/apache/druid/blob/-/server/src/main/java/org/apache/druid/server/coordinator/loading/StrategicSegmentAssigner.java#L266))
 is: loadedNotDropping + loading - max(0, moveFrom - moveTo) = 2 (both A/B are 
loaded at this point) + 0 - max(0, 1 - 1) = 2, this causes the drop to be 
subsequently scheduled on B.
   
   This can probably play out as follows:
   - T1: Duty run starts
   - T1: Initialize `ServerHolder` for A
   - T2: `SegmentLoadQueueManager` calls `peonA.unmarkSegmentToDrop(s)`
   - T1: Considers segment S to have status `loaded` on A
   - T2: `SegmentLoadQueueManager` calls `peonA.dropSegment(s)`
   - T1: Runs rules assuming S is loaded on both A and B
   
   This seems closer to what you are witnessing. Just to clarify though, the 
`moveFrom` and `moveTo` counts would both be 0 in this case.
   
   #### Solution:
   
   I think we can fix this one by:
   - Using a `synchronized(lock)` inside 
`HttpLoadQueuePeon.getSegmentsMarkedToDrop()`
   - Not calling `peonA.unmarkSegmentToDrop()` explicitly in 
[SegmentLoadQueueManager#L128](https://github.com/apache/druid/blob/be10abdbe0208ae007d47cfa0b4aeb0ea8d713f4/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadQueueManager.java#L128).
   - Instead, perform `segmentsMarkedToDrop.remove(s)` inside 
`HttpLoadQueuePeon.dropSegment()` inside the synchronized block.
   
   @jtuglu1 , please let me know if either of these solutions works for the 
issue that you have encountered.
   I think solution B is something we should do anyway as it makes the whole 
flow more thread-safe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to