jtuglu1 commented on PR #18824: URL: https://github.com/apache/druid/pull/18824#issuecomment-3624733524
Hi @kfaraz, thanks for the patch. While I think this does potentially fix some other issues (and is just generally good to have), I don't think this fully fixes the issue in the aforementioned issue: - The server inventory view is updated to show `loaded=2` (callback on B is showing). - We take a copy of the server inventory view in `prepareServers()` which is then incrementing loaded here. - Then, load on B + drop on A callbacks happen. - We take a copy of the `queuedSegments` in all the HttpLoadQueuePeon in `prepareCluster()` routine. Since the callbacks have already completed, the copy doesn't see any dropping or loading, etc. (it's effectively empty for that segment). But the old "stale" copy of the server inventory view shows `loaded=2`. While this is still improvement, I don't think it fixes the core issue that the server inventory and the httploadqueue peons can be "snapshotted" non-atomically, allowing for how we [build](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentReplicaCountMap.java#L52-L61) the SegmentReplicaCount to be messed up. We need to have a way to ensure the `ServerHolder` and the `HttpLoadQueuePeon` can be atomically snapshotted without any callback interleavings happening. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
