I've drafted a small design document for this change: https://docs.google.com/document/d/13xm23SfIRy5zMro82Ok8dRCsr7lKcC0_UUO_tJX21wQ/edit?usp=sharing
Any feedback would be greatly appreciated! On Tue, Mar 7, 2017 at 11:15 AM, Cody G <codyhg...@gmail.com> wrote: > Created a ticket https://issues.apache.org/jira/browse/AURORA-1900 and > assigned to myself. > > On Fri, Mar 3, 2017 at 11:29 AM, David McLaughlin <dmclaugh...@apache.org> > wrote: > >> +1 for thinner client. >> >> Another reason rolling update was moved to the Scheduler was to have an >> audit trail of changes to the job. If we could also get these restarts >> appearing on the job page, it would be great. >> >> On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji <zma...@apache.org> wrote: >> >> > +1 >> > >> > If I recall correctly, the rolling update mechanism was added to Aurora >> > because having the client coordinate batching was pretty tricky. I think >> > the same applies here to a rolling restart. >> > >> > Considering the job controller technically supports this, adding a new >> RPC >> > to expose this behaviour would be beneficial. >> > >> > On Thu, Mar 2, 2017 at 7:40 PM, Cody G <codyhg...@gmail.com> wrote: >> > >> > > Hi all, >> > > >> > > I'd like to implement some new functionality in Aurora allowing for >> > rolling >> > > job restarts. There are many reasons why we might need to restart a >> job, >> > > e.g. freeing instances of a job from deadlock or refreshing some sort >> of >> > > external configuration. >> > > >> > > Currently, there are two options to execute a rolling restart, however >> > both >> > > are undesirable — either use the restartShards endpoint and implement >> > > batching client-side, or use startJobUpdate with slightly modified >> task >> > > config so that a non-empty job diff forces an update. I propose >> adding a >> > > new thrift RPC for launching a rolling restart, which is an interface >> > > around the existing upgrade logic. Instead of requiring a TaskConfig >> and >> > > instanceCount, this restart endpoint will only accept >> JobUpdateSettings >> > and >> > > will simply launch an update with the currently used task >> configuration. >> > > All of the existing job update RPCs will still be able to access >> updates >> > > which were launched from this restart endpoint. This ensures restarts >> are >> > > available in the UI and no additional storage changes are required. >> > > >> > > If this proposal seems reasonable, I’ll file a ticket and draft up a >> more >> > > detailed RFC for further review. >> > > >> > > Cody >> > > >> > > -- >> > > Zameer Manji >> > > >> > >> > >