yuanlihan commented on pull request #11257: URL: https://github.com/apache/druid/pull/11257#issuecomment-873774263
> I haven't read the code yet, just reading comments and responding to my tag from @a2l007 ... I do think it is very helpful to keep the `percentOfSegmentsToConsider` value. Our largest cluster has over 2MM segments with replication and slashing that value down has allowed us to cut a large chunk out of coordination time in this phase by short circuiting at what we have deemed to be an acceptable point in the iteration. I'll try to read through full PR today and update my comment. > > Also this patch and the issue it is resolving would help with the underlying performance issues of the design here. My patch to short circuit was more of a bandaid. This seems like a design improvement that visits the root issue of this method being invoked more than needed! So maybe my tuning config is less helpful once this PR is in? Or maybe it is still useful and both together would be great? Hi @capistrant, thanks for your comment. It's great to know that we both have tried to solve this issue. Ideally, this patch should be able to solve this issue fundamentally while it makes sense to me to evolve gradually in engineering. So, I agree to retain the `percentOfSegmentsToConsider` improvement and add an another parameter `useBatchedSegmentSampler` to enable to the new improvement as suggested by previous comments. And it'll be great if you can help to review this PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
