kfaraz commented on issue #18764: URL: https://github.com/apache/druid/issues/18764#issuecomment-3557691967
@jtuglu1 , thanks for reporting this! IIUC, there are two parts to the problem. ### A > The issue is that there is a momentary lapse in time between when server B loads the segment (and it [appears](https://github.com/apache/druid/blob/-/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentReplicaCountMap.java#L54) as loaded in accounting) and when the callback for dropping on server A occurs. I think this may be true but more from the Broker's perspective i.e. it is possible that the Broker sees the drop from A and then the load on B, causing the segment to be unavailable for a brief period. Since the Coordinator is not aware of the inventory of the individual Brokers, there isn't much we can do here. Generally, the Coordinator's own inventory is meant to serve as a proxy for what the Brokers see. And if Coordinator's inventory does not have S on B, then Coordinator wouldn't decide to remove the segment from B anyway. On the contrary, it might try to load S somewhere (A or B or maybe even C) leading to over-replication in the next cycle and then another drop from somewhere. But I assume this is not the case discussed here. #### Solution Even though this is not the bug reported in this ticket, there is room for improvement (or atleast an alternate strategy). We could delay the drop from A until Coordinator is sure that B has loaded S (this would save the extra work which sometimes ends up nullifying the move from A to B by reloading the segment on A 😅 ). - An easy way to do that would be to leave S in `segmentsMarkedToDrop` of peon A and do nothing in the callback of `SegmentLoadQueueManager`. - When S shows up on B and we recognize it as over-replicated (in a later duty run), we prioritize drop of the extra S from A (since it is already marked to drop from A). - We might also need to tweak the calculation of replicas so that `movingFrom` segments also count towards over-replication. - I suppose the drawback is that segment balancing would now seem to be slow as the segments would stick around on A until the next coordinator cycle (typically 1 minute). ### B > Because the projectedReplicas (calculated [here](https://github.com/apache/druid/blob/-/server/src/main/java/org/apache/druid/server/coordinator/loading/StrategicSegmentAssigner.java#L266)) is: loadedNotDropping + loading - max(0, moveFrom - moveTo) = 2 (both A/B are loaded at this point) + 0 - max(0, 1 - 1) = 2, this causes the drop to be subsequently scheduled on B. This can probably play out as follows: - T1: Duty run starts - T1: Initialize `ServerHolder` for A - T2: `SegmentLoadQueueManager` calls `peonA.unmarkSegmentToDrop(s)` - T1: Considers segment S to have status `loaded` on A - T2: `SegmentLoadQueueManager` calls `peonA.dropSegment(s)` - T1: Runs rules assuming S is loaded on both A and B This seems closer to what you are witnessing. Just to clarify though, the `moveFrom` and `moveTo` counts would both be 0 in this case. #### Solution: I think we can fix this one by: - Using a `synchronized(lock)` inside `HttpLoadQueuePeon.getSegmentsMarkedToDrop()` - Not calling `peonA.unmarkSegmentToDrop()` explicitly in [SegmentLoadQueueManager#L128](https://github.com/apache/druid/blob/be10abdbe0208ae007d47cfa0b4aeb0ea8d713f4/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentLoadQueueManager.java#L128). - Instead, perform `segmentsMarkedToDrop.remove(s)` inside `HttpLoadQueuePeon.dropSegment()` inside the synchronized block. @jtuglu1 , please let me know if either of these solutions works for the issue that you have encountered. I think solution B is something we should do anyway as it makes the whole flow more thread-safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
