[GitHub] [druid] sub007 edited a comment on issue #10838: Supervisors全部UNHEALTHY_TASKS

GitBox Fri, 12 Mar 2021 07:49:47 -0800


sub007 edited a comment on issue #10838:
URL: https://github.com/apache/druid/issues/10838#issuecomment-797575217



   I see the same issue as reported above. 
   
   Here this is what I witnessed -
   I've a supervisor running. The task duration is 60mins.
   When I check the status of the supervisor, it's UNHEALTHY_TASKS.
   The reason for that is (as shown in the supervisor status), some failed 
tasks in the past and not the recent past. The supervisor has moved on from the 
time when some tasks failed with concurrent execution exception and has created 
multiple tasks after that and those tasks have ingested data from corresponding 
kakfa topic for the task duration set and then terminated. 
   But still - all those tasks show their corresponding status as failed. 
   I checked the task logs - but I didn't see any errors in the task index 
logs. 
   Nothing in the logs of any of the other services as well.
   I understand that, for a Supervisor to move to UNHEALTHY_TASKS status, the 
successive 3 tasks should end up in failed state. And to return to HEALTHY 
state, the 3 successive tasks should end up in success.
   
   So, there are two things 
   1. Why are all the tasks ending up in failed state but no errors in logs.
   2. When I look at the supervisor status, why does it list a set of 3 tasks 
which have failed with concurrent execution exception long time ago and doesn't 
list any of the latest failed tasks as the reason for its unhealthy state. 
   
   My discussion with Peter on slack channel 
[here](https://the-asf.slack.com/archives/CJ8D1JTB8/p1615212967202500)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] sub007 edited a comment on issue #10838: Supervisors全部UNHEALTHY_TASKS

Reply via email to