gianm commented on code in PR #17446:
URL: https://github.com/apache/druid/pull/17446#discussion_r1833035823
##########
extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java:
##########
@@ -321,16 +333,40 @@ public List<Pair<Task, ListenableFuture<TaskStatus>>>
restore()
public void start()
{
log.info("Starting K8sTaskRunner...");
- // Load tasks from previously running jobs and wait for their statuses to
be updated asynchronously.
- for (Job job : client.getPeonJobs()) {
+ // Load tasks from previously running jobs and wait for their statuses to
start running.
+ final List<ListenableFuture<Boolean>> taskStatusActiveList = new
ArrayList<>();
+ final List<Job> peonJobs = client.getPeonJobs();
+
+ log.info("Locating [%,d] active tasks.", peonJobs.size());
+ for (Job job : peonJobs) {
try {
- joinAsync(adapter.toTask(job));
+ KubernetesWorkItem kubernetesWorkItem = joinAsync(adapter.toTask(job));
+
taskStatusActiveList.add(kubernetesWorkItem.getPeonLifeycle().getTaskStartedSuccessfullyFuture());
}
catch (IOException e) {
log.error(e, "Error deserializing task from job [%s]",
job.getMetadata().getName());
}
}
- log.info("Loaded %,d tasks from previous run", tasks.size());
+
+ try {
+ final DateTime nowUtc = DateTimes.nowUtc();
+ final long timeoutMs =
nowUtc.plus(config.getTaskJoinTimeout()).getMillis() - nowUtc.getMillis();
+ if (timeoutMs > 0) {
+ FutureUtils.coalesce(taskStatusActiveList).get(timeoutMs,
TimeUnit.MILLISECONDS);
+ }
+ log.info("Located [%,d] active tasks.", taskStatusActiveList.size());
+ }
+ catch (Exception e) {
+ final long numInitialized =
+ tasks.values().stream().filter(item ->
item.getPeonLifeycle().getTaskStartedSuccessfullyFuture().isDone()).count();
Review Comment:
I don't see anything that recovers from failed `join`, so I suppose there's
three cases?
- future is done and `true` — the task was joined
- future is done and `false` — error joining the task, i suppose it's never
going to be joined? at this point should we mark it failed?
- future is not yet done — the task is still being joined async
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]