jtuglu1 commented on PR #18824:
URL: https://github.com/apache/druid/pull/18824#issuecomment-3624733524

   Hi @kfaraz, thanks for the patch. While I think this does potentially fix 
some other issues (and is just generally good to have),  I don't think this 
fully fixes the issue in the aforementioned issue:
   
   - The server inventory view is updated to show `loaded=2` (callback on B is 
showing).
   - We take a copy of the server inventory view in `prepareServers()` which is 
then incrementing loaded here.
   - Then, load on B + drop on A callbacks happen.
   - We take a copy of the `queuedSegments` in all the HttpLoadQueuePeon in 
`prepareCluster()` routine. Since the callbacks have already completed, the 
copy doesn't see any dropping or loading, etc. (it's effectively empty for that 
segment). But the old "stale" copy of the server inventory view shows 
`loaded=2`. 
   
   While this is still improvement, I don't think it fixes the core issue that 
the server inventory and the httploadqueue peons can be "snapshotted" 
non-atomically, allowing for how we 
[build](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/loading/SegmentReplicaCountMap.java#L52-L61)
 the SegmentReplicaCount to be messed up. We need to have a way to ensure the 
`ServerHolder` and the `HttpLoadQueuePeon` can be atomically snapshotted 
without any callback interleavings happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to