kfaraz commented on code in PR #18402:
URL: https://github.com/apache/druid/pull/18402#discussion_r2465776914
##########
indexing-service/src/main/java/org/apache/druid/indexing/compact/OverlordCompactionScheduler.java:
##########
@@ -107,43 +121,57 @@ public class OverlordCompactionScheduler implements
CompactionScheduler
private final AtomicBoolean isLeader = new AtomicBoolean(false);
private final AtomicBoolean started = new AtomicBoolean(false);
- private final CompactSegments duty;
/**
* The scheduler should enable/disable polling of segments only if the
Overlord
* is running in standalone mode, otherwise this is handled by the
DruidCoordinator
* class itself.
*/
private final boolean shouldPollSegments;
+ private final long schedulePeriodMillis;
private final Stopwatch sinceStatsEmitted = Stopwatch.createUnstarted();
@Inject
public OverlordCompactionScheduler(
TaskMaster taskMaster,
+ GlobalTaskLockbox taskLockbox,
TaskQueryTool taskQueryTool,
SegmentsMetadataManager segmentManager,
+ SegmentsMetadataManagerConfig segmentManagerConfig,
Supplier<DruidCompactionConfig> compactionConfigSupplier,
CompactionStatusTracker statusTracker,
CoordinatorOverlordServiceConfig coordinatorOverlordServiceConfig,
+ TaskActionClientFactory taskActionClientFactory,
+ DruidInputSourceFactory druidInputSourceFactory,
ScheduledExecutorFactory executorFactory,
+ BrokerClient brokerClient,
ServiceEmitter emitter,
ObjectMapper objectMapper
)
{
+ final long segmentPollPeriodSeconds =
+
segmentManagerConfig.getPollDuration().toStandardDuration().getMillis();
+ this.schedulePeriodMillis = Math.min(5_000, segmentPollPeriodSeconds);
Review Comment:
Yes, this value has been carried over from the original iteration of the
`OverlordCompactionScheduler`.
I had intended the scheduler to be able to pick up compaction jobs as soon
as compaction task slots become available but it is wasteful to recompute the
entire queue for that, especially since on large clusters, recomputation of the
entire queue may take up to a couple of minutes (seen from `coordinator/time`
metric on large clusters).
We can do the following instead:
- Keep the schedule period as 5 minutes (or may be even higher?)
- When the scheduler kicks in, reset the job queue.
- We already receive a callback whenever a task completes.
- When a task completes and slots become available, check if there is any
pending job still in the queue from the last scheduled run. If there is a
pending job, launch that.
Please let me know if this makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]