yuanlihan commented on pull request #11257:
URL: https://github.com/apache/druid/pull/11257#issuecomment-873774263


   > I haven't read the code yet, just reading comments and responding to my 
tag from @a2l007 ... I do think it is very helpful to keep the 
`percentOfSegmentsToConsider` value. Our largest cluster has over 2MM segments 
with replication and slashing that value down has allowed us to cut a large 
chunk out of coordination time in this phase by short circuiting at what we 
have deemed to be an acceptable point in the iteration. I'll try to read 
through full PR today and update my comment.
   > 
   > Also this patch and the issue it is resolving would help with the 
underlying performance issues of the design here. My patch to short circuit was 
more of a bandaid. This seems like a design improvement that visits the root 
issue of this method being invoked more than needed! So maybe my tuning config 
is less helpful once this PR is in? Or maybe it is still useful and both 
together would be great?
   
   Hi @capistrant, thanks for your comment. It's great to know that we both 
have tried to solve this issue. 
   
   Ideally, this patch should be able to solve this issue fundamentally while 
it makes sense to me to evolve gradually in engineering. So, I agree to retain 
the `percentOfSegmentsToConsider` improvement and add an another parameter 
`useBatchedSegmentSampler` to enable to the new improvement as suggested by 
previous comments. 
   
   And it'll be great if you can help to review this PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to