kfaraz commented on PR #13852:
URL: https://github.com/apache/druid/pull/13852#issuecomment-1449814414
Thanks a lot for taking this up, @suneet-s !
I had a similar approach in mind to what @imply-cheddar has described.
@suneet-s , I think you too have mentioned this as a design that you
considered, breaking up the `CompactSegments` duty into two.
I feel it would be a cleaner approach and would avoid the need to add a new
config.
- Keep the `CompactSegments` duty mostly unchanged
- it would run at the custom period, if specified, otherwise, the default
`indexingPeriod` (30 min)
- it continues to call `policy.reset()` to create a new iterator
- pass the `CompactionJobQueue` to this duty and update it with the new
iterator
- Optional: Rename this duty to reflect that it just identifies
compactible intervals (maybe `GetIntervalsForCompaction` or something) and does
not actually invoke compaction.
- Add a new duty `QueueCompactionTasks`
- This runs as frequently as other coordinator duties (maybe alongwith
historical management duties, i.e. every 1 min or so).
- Avoids the need of having a separate executor/thread and is frequent
enough to make the best use of available compaction task slots.
- Asks the `CompactionJobQueue` for the next compaction tasks to queue
and sends them to the Overlord
- For the most part, this translates to moving the
`CompactSegments.doRun()` method to this new duty
- Stats computation
- It can remain in `CompactSegments` if it is a costly operation
- If required, we could even choose to expand `CompactionJobQueue` to be
a `ClusterCompactionState` which could accumulate stats and snapshots as and
when they are built and report them when necessary.
The other advantages of this approach are that users can benefit from it
even if they don't specify a custom period for `CompactSegments`. (In the
future, it might even help us do away with having `CompactSegments` as a custom
duty, which is a little weird since compaction is a core feature of the
ingestion system.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]