a2l007 commented on issue #10193: URL: https://github.com/apache/druid/issues/10193#issuecomment-660282549
> @a2l007 thanks for filing this issue. Does this bug make the coordinator confused until the historical announces those segments? Like, if a segment was removed from a loadQueuePeon but a historical load and announced it later, the server view would be updated properly and the coordinator could make a valid decision? Yeah this is exactly what's happening. Due to this behavior, Coordinator would think a specific set of historicals have enough capacity and would continue to try to assign load requests to it. The problem keeps cascading as there are more segments to load. One of our clusters encountered this issue where a few of the historicals were loaded upto 100% even though there were many other historicals with just 65% filled capacity. > To make the historical process them faster, does setting druid.segmentCache.numLoadingThreads to something high also help? Yes, that does help to a certain extent as well. However certain clusters tend to set the default value for this property in order to minimize zookeeper flakiness that sometimes accompanies higher `druid.segmentCache.numLoadingThreads` values. In terms of a fix for this issue, I think using the `failedAssignCount` from LoadQueuePeon or something similar as a factor in the balancing strategy could help. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
