Zeyu-Chen-SFDC commented on issue #10868:
URL: https://github.com/apache/druid/issues/10868#issuecomment-2083564681

   We have seen a number of this recently.
   
   In our particular success-task-misreported-as-failure cases, sometimes the 
task shutdowns (triggered from registered shutdown hooks or `@LifecycleStop` 
annotations 
[here](https://github.com/apache/druid/blob/master/services/src/main/java/org/apache/druid/cli/CliPeon.java#L372-L384)
 ) aren't clean and result in non-zero exit codes, even though the tasks have 
reported their status as `SUCCESS` in both their logs and their 
`task_folder/status.json` file. These non-zero exit codes along with the MM's 
[task-reaping 
logic](https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/ForkingTaskRunner.java#L404-L419)
 together lead to the misreportings.
   
   IMHO the task-reaping logic prioritizing exit codes over the content in 
`task_folder/status.json` file is wrong. A task should be reported as a 
`SUCCESS` as long as it has written a `task_folder/status.json` file with 
`"status" : "SUCCESS"` in it. Any number of exceptions can be thrown during the 
shutdown sequence but that shouldn't affect the contract already delivered by 
the task.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to