waixiaoyu opened a new issue #6577: Tasks got stuck in "Pending" states forever 
caused by empty callback
URL: https://github.com/apache/incubator-druid/issues/6577
 
 
   One day, when the druid (0.11.0) cluster restarting, I found some tasks 
stuck in pending task list. After several days, the cluster worked well, but 
these tasks still stuck here. Every modules are normal expect these task. 
   
   But actually, these tasks had done in SUCCESS states due to log of overlord, 
and data from these tasks had injected into Druid which appeared in coordinator 
console.
   
   After analyzing source code, I found there seems like a tiny bug.
   
   In the normal process, overlord restarts in following procedures:
   1. waiting for leader selecting
   2. waiting for several second before starting manage() in TaskQueue thread 
   3. restoring tasks and normal attach callback function
   4. adding tasks in pending task
   
   For MiddleManager, it can restore all tasks which was running before 
restarting, and register these tasks in Zookeeper. And now, Overlord can 
receive CHILD_ADDED from running tasks, and updating running task list. But if 
Overlord receive Zookeeper task running event before starting manage(), it will 
register an empty callback. And the pending task list can not be clear forever. 
   
   Through breaking point in TaskQueue and waiting for Zookeeper event can 
reproduce this bug. 
   
   Finally, this bug can be judged by 2 scenarios:
   1. tasks stuck in pending task list forever
   2. normal callback resisted in Map<String, ListenableFuture<TaskStatus>> 
taskFutures can never  be clear up. TaskQueue:260
   
   Even though it is just a display bug, it makes users feel quite confused. 
   Registering a valid callback in CHILD_ADDED and CHILD_UPDATE instead of a 
empty callback when new a RemoteTaskRunnerWorkItem can fix it. 
RemoteTaskRunner:978

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to