weizuo93 opened a new issue #4988:
URL: https://github.com/apache/incubator-doris/issues/4988


   The current tablet selection strategy for compaction task is to traverse all 
tablets and then find a tablet with the highest score for compaction task. This 
mechanism is expensive to select a tablet for compaction task.
   
   Can we hold a `heap`with specified number of candidate tablets for each disk?
   
    Heap elements are sorted by score and the top tablet in `heap`hold the 
highest score. Top element in `heap` will be selected when producer generates 
compaction task for this disk. After a tablet is selected for compaction, 
`pop`for this tablet is unnecessary but `update and sort heap` is needed. 
   
   A new tablet will be pushed into the `heap` when the following occurs:
   (1)`load` operation on the tablet is performed;
   (2)`scan`operation on the tablet is performed for `n`times and `n` could be 
configurable.
   To keep the number of tablets in the `heap` constant, the last tablet will 
pop from `heap`  when a new tablet is pushed into `heap`.
   
   It seems that this mechanism can avoid traversing all tablets when selecting 
a tablet for compaction task. Of course, some thread-safety problems need to be 
addressed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to