I read the code more closely and now I see that the idea is that at `taskDuration`, tasks should do a final publish and exit. So that's what `finalize` is for. The checkpointTaskGroup function, when finalize is true, will check if any task completed, and if so, stop all its replicas. This makes sense, since there is no point in replicas continuing to run if some task in the group is done. (Because they are all doing the same work.)
With your patch, checkpointTaskGroup, when finalize is true, will now kill any task that has null status. I don't see why this is a good thing. After the `taskDuration` is over, we want to trigger a final checkpoint/publish, and then let all tasks in a group keep running until one of them is successful. Killing one with unknown status seems counter-productive to that goal. Am I wrong -- is there a reason it's a good idea to kill tasks with unknown status in this case? [ Full content available at: https://github.com/apache/incubator-druid/pull/6206 ] This message was relayed via gitbox.apache.org for [email protected]
