weizuo93 opened a new issue #4988:
URL: https://github.com/apache/incubator-doris/issues/4988
The current tablet selection strategy for compaction task is to traverse all
tablets and then find a tablet with the highest score for compaction task. This
mechanism is expensive to select a tablet for compaction task.
Can we hold a `heap`with specified number of candidate tablets for each disk?
Heap elements are sorted by score and the top tablet in `heap`hold the
highest score. Top element in `heap` will be selected when producer generates
compaction task for this disk. After a tablet is selected for compaction,
`pop`for this tablet is unnecessary but `update and sort heap` is needed.
A new tablet will be pushed into the `heap` when the following occurs:
(1)`load` operation on the tablet is performed;
(2)`scan`operation on the tablet is performed for `n`times and `n` could be
configurable.
To keep the number of tablets in the `heap` constant, the last tablet will
pop from `heap` when a new tablet is pushed into `heap`.
It seems that this mechanism can avoid traversing all tablets when selecting
a tablet for compaction task. Of course, some thread-safety problems need to be
addressed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]