gianm commented on code in PR #15174:
URL: https://github.com/apache/druid/pull/15174#discussion_r1372797681


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -2113,7 +2113,13 @@ public Boolean 
apply(Pair<SeekableStreamIndexTaskRunner.Status, Map<PartitionIdT
     for (int i = 0; i < results.size(); i++) {
       String taskId = futureTaskIds.get(i);
       if (results.get(i).isError() || results.get(i).valueOrThrow() == null) {
-        killTask(taskId, "Task [%s] failed to return status, killing task", 
taskId);
+        log.info("Task [%s] failed to respond, might need to kill it.", 
taskId);
+        Optional<TaskStatus> killTaskStatus = taskStorage.getStatus(taskId);
+        if (killTaskStatus.isPresent() && !killTaskStatus.get().isComplete()) {
+          // If the task completed while we were failing to chat with it, 
don't do anything.
+          log.info("Task [%s] failed to return status, killing task", taskId);
+          killTask(taskId, "Task [%s] failed to return status, killing task", 
taskId);

Review Comment:
   Seems like it would be better to fix things by making `TaskQueue#shutdown` 
(which `killTask` calls) safe to call if the task has already completed. Then 
this block can revert to being a simple `killTask` call, and all other callers 
of `TaskQueue#shutdown` will also benefit.
   
   So, what goes wrong when `TaskQueue#shutdown` is called on a task that is 
already complete? Is that fixable?



##########
server/src/main/java/org/apache/druid/segment/realtime/firehose/ChatHandlerResource.java:
##########
@@ -73,6 +74,6 @@ public Object doTaskChat(@PathParam("id") String handlerId, 
@Context HttpHeaders
       return handler.get();
     }
 
-    throw new BadRequestException(StringUtils.format("Can't find chatHandler 
for handler[%s]", handlerId));
+    throw new ServiceUnavailableException(StringUtils.format("Can't find 
chatHandler for handler[%s]", handlerId));

Review Comment:
   Deserves a comment: something like "Return HTTP 503 so 
SpecificTaskRetryPolicy retries in case the handler is not registered yet."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to