This matches what I was nudging towards. I think it offers what we want without taking on a large technical challenge and placing restrictions on the executor data blob.
On Friday, October 2, 2015, Maxim Khutornenko <[email protected]> wrote: > Thanks David. I agree pivoting this proposal around job update driven > instructions would give us more incremental value for much less > effort. I have researched and evaluated various Java and JSON diff > libraries available under Apache license and could not find a > reasonable solution that would give us exactly what we need. I started > thinking about building a custom JSON diff java tool, which is > certainly beyond the effort I hoped it would take. > > With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API > we could immediately improve the client diff story by avoiding > duplicate diff results. We would also be able to have something like > '--dry-run' option on a 'aurora update start' command listing the > affected instances and actions about to be taken. > > As for the actual TaskConfig diff details, this seems like a natural > improvement we could add later when/if we find a better JSON diffing > tool or dare to implement our own. > > Any thoughts/objections to the above? > > On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <[email protected] > <javascript:;>> wrote: > > I'd like to propose an alternative - that we start off by having an API > > endpoint which simply returns the JobUpdateInstructions that describes > the > > changes that would happen if a given JobUpdateRequest was applied. > > > > There is a lot of value in having clients ask the scheduler to tell them > > what is going to happen rather than try and duplicate that diff algorithm > > in both places. Of course this does not solve the "pretty" diff command > > when trying to *display* the differences, but it does remove a lot of the > > work in implementing the diff command and I think for the different types > > of clients (UI/CLI/etc.) displaying the diffs is going to be a much > bigger > > problem anyway. > > > > In the future if the scheduler does calculate and return some > > representation of the diff, I'd argue that it should be stored in > > JobUpdateInstructions anyway, so we can always add that functionality > later > > and keep the APIs the same. > > > > Thanks, > > David > > > > > > On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <[email protected] > <javascript:;>> wrote: > > > >> I think it's fine to provide an 'enhanced' experience when the format is > >> JSON, but i don't think we should force that. > >> > >> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <[email protected] > <javascript:;>> > >> wrote: > >> > >> > The benefit of assuming a certain format is the richer experience we > >> > can give to our users. The *blob* may be too large to make any sense > >> > of it during diffing. I don't propose to enforce any schema though, > >> > that would be too restrictive. I do however believe assuming JSON > >> > format would be an acceptable tradeoff. > >> > > >> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs > >> > and disabling JSON diffs for any executor types that don't follow the > >> > guidance. This will result in a degraded user experience but may be > >> > the middle ground here. Thoughts? > >> > > >> > On Tue, Sep 15, 2015 at 11:43 AM, <[email protected]> > wrote: > >> > > Isn't this data supposed to be any blob that transparently passes > in to > >> > the executor through mesos data blob. Why would we want to impose any > >> sort > >> > of format? It could be a binary blob. Executor writers should be able > to > >> > move between different schedulers/frameworks without any rework > ideally. > >> > This field seems like more like garbage in and garbage out and only > >> > understood by end client and the executor. Scheduler may stay out of > it. > >> > > If you compute hash and indicate same or different data between 2 > job > >> > update diff, that may be reasonable. > >> > > > >> > > My 2 cents. > >> > > > >> > > Thx > >> > > > >> > > Sent from my iPhone > >> > > > >> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <[email protected] > <javascript:;>> wrote: > >> > >> > >> > >> I'm a proponent of firming up our executor <-> scheduler contract. > >> > Since we > >> > >> are going to get multiple executor support soon I think it would be > >> > nice if > >> > >> we said that ExecutorConfig.data was JSON. > >> > >> > >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko < > [email protected] <javascript:;> > >> > > >> > >> wrote: > >> > >> > >> > >>> | I hope this doesn't mean we would be returning a textual > >> > >>> representation of a diff > >> > >>> > >> > >>> If we can make an assumption that executor data is always JSON, we > >> can > >> > >>> deliver a much more specific answer by applying JSON diff tools. > >> > >>> Something like: > >> > >>> > >> > >>> - "environment": "prod" > >> > >>> + "environment": "test" > >> > >>> > >> > >>> Otherwise, we would have to output the entire ExecutorConfig.data > >> blob > >> > >>> content for both left and right sides and let users figure out the > >> > >>> problem. I don't think that's acceptable. > >> > >>> > >> > >>> Does it make sense? Any suggestions on the output format of the > diff? > >> > >>> I think it should be structured but at the same time we have to > get > >> > >>> down to text level at some point to report concrete discrepancies. > >> > >>> > >> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <[email protected] > <javascript:;>> > >> > wrote: > >> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can > >> > support > >> > >>>> alternative executors. I'd hate for that to go away. > >> > >>>> > >> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell < > [email protected] <javascript:;> > >> > > >> > >>>> wrote: > >> > >>>> > >> > >>>>> This is one of the hoops encountered when using the Thrift api > >> > directly > >> > >>> and > >> > >>>>> not using the client, I'd love to see ExecutorConfig.data move > to a > >> > >>> thrift > >> > >>>>> object and not be a string blob > >> > >>>>> > >> > >>>>> -Jake > >> > >>>>> > >> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner < > [email protected] <javascript:;>> > >> > >>> wrote: > >> > >>>>> > >> > >>>>>> I like the idea of adding this API, but i don't see why it > >> requires > >> > >>> us to > >> > >>>>>> make assumptions about ExecutorConfig.data. I hope this > doesn't > >> > mean > >> > >>> we > >> > >>>>>> would be returning a textual representation of a diff. Can you > >> > >>> elaborate > >> > >>>>>> on that? > >> > >>>>>> > >> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko < > >> > [email protected] <javascript:;>> > >> > >>>>>> wrote: > >> > >>>>>> > >> > >>>>>>> Problem: > >> > >>>>>>> We currently don't have a good user experience around "aurora > job > >> > >>>>>>> diff" command. The task configs are dumped as "prettified" > JSON > >> > >>>>>>> strings and diffed with the system diff tool. Anyone who > tried to > >> > >>> use > >> > >>>>>>> it knows it can be very hard to read especially when it comes > to > >> > >>>>>>> executor data deltas. Also, the implementation is done > completely > >> > >>>>>>> within the Aurora client making it hard to reuse this feature > by > >> > >>> other > >> > >>>>>>> clients (e.g.: an external deploy coordination tool). > >> > >>>>>>> > >> > >>>>>>> Proposal: > >> > >>>>>>> Move the diff logic to the scheduler and expose it via a new > >> > >>>>>>> jobConfigDiff thrift API. > >> > >>>>>>> > >> > >>>>>>> Benefits: > >> > >>>>>>> - Client will no longer have the custom non-reusable logic > moving > >> > us > >> > >>>>>>> closer towards a "thin client" goal. > >> > >>>>>>> - The new RPC can be fully used by any existing or new API > >> clients. > >> > >>>>>>> - The diff output will be improved via leveraging third party > >> POJO > >> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.]. > >> > >>>>>>> - The server updater can be partially/fully unified with the > new > >> > >>> diff > >> > >>>>>>> logic further improving the overall DRY-ness. > >> > >>>>>>> > >> > >>>>>>> Concerns: > >> > >>>>>>> - The executor data is currently treated as an opaque string > blob > >> > on > >> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be > >> JSON. > >> > >>> In > >> > >>>>>>> order to deliver the best UX, the scheduler would have to > start > >> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON. > >> > >>>>>>> > >> > >>>>>>> Any other concerns/objections/comments? I would like to > formalize > >> > >>> the > >> > >>>>>>> proposal be EOW if we reach consensus quickly. > >> > >>>>>>> > >> > >>>>>>> Thanks, > >> > >>>>>>> Maxim > >> > >>>>>>> > >> > >>>>>>> [1] - > >> > >>> > >> > > >> > http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started > >> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/ > >> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert > >> > >>> > >> > >>> -- > >> > >>> Zameer Manji > >> > >>> > >> > >>> > >> > > >> >
