I read the code more closely and now I see that the idea is that at 
`taskDuration`, tasks should do a final publish and exit. So that's what 
`finalize` is for. The checkpointTaskGroup function, when finalize is true, 
will check if any task completed, and if so, stop all its replicas. This makes 
sense, since there is no point in replicas continuing to run if some task in 
the group is done. (Because they are all doing the same work.)

With your patch, checkpointTaskGroup, when finalize is true, will now kill any 
task that has null status. I don't see why this is a good thing. After the 
`taskDuration` is over, we want to trigger a final checkpoint/publish, and then 
let all tasks in a group keep running until one of them is successful. Killing 
one with unknown status seems counter-productive to that goal.

Am I wrong -- is there a reason it's a good idea to kill tasks with unknown 
status in this case?

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/6206 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to