gianm commented on code in PR #17446:
URL: https://github.com/apache/druid/pull/17446#discussion_r1833021788
##########
extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java:
##########
@@ -321,16 +333,40 @@ public List<Pair<Task, ListenableFuture<TaskStatus>>>
restore()
public void start()
{
log.info("Starting K8sTaskRunner...");
- // Load tasks from previously running jobs and wait for their statuses to
be updated asynchronously.
- for (Job job : client.getPeonJobs()) {
+ // Load tasks from previously running jobs and wait for their statuses to
start running.
+ final List<ListenableFuture<Boolean>> taskStatusActiveList = new
ArrayList<>();
+ final List<Job> peonJobs = client.getPeonJobs();
+
+ log.info("Locating [%,d] active tasks.", peonJobs.size());
+ for (Job job : peonJobs) {
try {
- joinAsync(adapter.toTask(job));
+ KubernetesWorkItem kubernetesWorkItem = joinAsync(adapter.toTask(job));
+
taskStatusActiveList.add(kubernetesWorkItem.getPeonLifeycle().getTaskStartedSuccessfullyFuture());
}
catch (IOException e) {
log.error(e, "Error deserializing task from job [%s]",
job.getMetadata().getName());
}
}
- log.info("Loaded %,d tasks from previous run", tasks.size());
+
+ try {
+ final DateTime nowUtc = DateTimes.nowUtc();
+ final long timeoutMs =
nowUtc.plus(config.getTaskJoinTimeout()).getMillis() - nowUtc.getMillis();
+ if (timeoutMs > 0) {
+ FutureUtils.coalesce(taskStatusActiveList).get(timeoutMs,
TimeUnit.MILLISECONDS);
+ }
+ log.info("Located [%,d] active tasks.", taskStatusActiveList.size());
+ }
+ catch (Exception e) {
+ final long numInitialized =
+ tasks.values().stream().filter(item ->
item.getPeonLifeycle().getTaskStartedSuccessfullyFuture().isDone()).count();
Review Comment:
This is just looking for how many of the futures finished. Should it be
looking for ones that finish *and* eval to `true`? I am wondering what the
boolean is for.
##########
extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesPeonLifecycle.java:
##########
@@ -136,14 +141,16 @@ protected synchronized TaskStatus run(Job job, long
launchTimeout, long timeout,
launchTimeout,
TimeUnit.MILLISECONDS
);
-
return join(timeout);
}
catch (Exception e) {
log.info("Failed to run task: %s", taskId.getOriginalTaskId());
throw e;
}
finally {
+ if (!taskStartedSuccessfullyFuture.isDone()) {
Review Comment:
taskStartedSuccessfullyFuture is not set to `true` in `run` as far as I can
tell. is that intentional?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]