a2l007 commented on issue #10193:
URL: https://github.com/apache/druid/issues/10193#issuecomment-660282549


   > @a2l007 thanks for filing this issue. Does this bug make the coordinator 
confused until the historical announces those segments? Like, if a segment was 
removed from a loadQueuePeon but a historical load and announced it later, the 
server view would be updated properly and the coordinator could make a valid 
decision?
   
   Yeah this is exactly what's happening.  Due to this behavior, Coordinator 
would think a specific set of historicals have enough capacity and would 
continue to try to assign load requests to it.  The problem keeps cascading as 
there are more segments to load. One of our clusters encountered this issue 
where a few of the historicals were loaded upto 100% even though there were 
many other historicals with just 65% filled capacity. 
   
   > To make the historical process them faster, does setting 
druid.segmentCache.numLoadingThreads to something high also help?
   
   Yes, that does help to a certain extent as well. However certain clusters 
tend to set the default value for this property in order to minimize zookeeper 
flakiness that sometimes accompanies higher 
`druid.segmentCache.numLoadingThreads` values.
   
   In terms of a fix for this issue, I think using the `failedAssignCount` from 
LoadQueuePeon or something similar as a factor in the balancing strategy could 
help. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to