+1 If I recall correctly, the rolling update mechanism was added to Aurora because having the client coordinate batching was pretty tricky. I think the same applies here to a rolling restart.
Considering the job controller technically supports this, adding a new RPC to expose this behaviour would be beneficial. On Thu, Mar 2, 2017 at 7:40 PM, Cody G <codyhg...@gmail.com> wrote: > Hi all, > > I'd like to implement some new functionality in Aurora allowing for rolling > job restarts. There are many reasons why we might need to restart a job, > e.g. freeing instances of a job from deadlock or refreshing some sort of > external configuration. > > Currently, there are two options to execute a rolling restart, however both > are undesirable — either use the restartShards endpoint and implement > batching client-side, or use startJobUpdate with slightly modified task > config so that a non-empty job diff forces an update. I propose adding a > new thrift RPC for launching a rolling restart, which is an interface > around the existing upgrade logic. Instead of requiring a TaskConfig and > instanceCount, this restart endpoint will only accept JobUpdateSettings and > will simply launch an update with the currently used task configuration. > All of the existing job update RPCs will still be able to access updates > which were launched from this restart endpoint. This ensures restarts are > available in the UI and no additional storage changes are required. > > If this proposal seems reasonable, I’ll file a ticket and draft up a more > detailed RFC for further review. > > Cody > > -- > Zameer Manji >