Re: JobConfig diff API

Maxim Khutornenko Tue, 06 Oct 2015 16:02:56 -0700

Here is a brief summary of proposed changes:
https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8


On Fri, Oct 2, 2015 at 1:47 PM, Bill Farner <[email protected]> wrote:
> This matches what I was nudging towards.  I think it offers what we want
> without taking on a large technical challenge and placing restrictions on
> the executor data blob.
>
> On Friday, October 2, 2015, Maxim Khutornenko <[email protected]> wrote:
>
>> Thanks David. I agree pivoting this proposal around job update driven
>> instructions would give us more incremental value for much less
>> effort. I have researched and evaluated various Java and JSON diff
>> libraries available under Apache license and could not find a
>> reasonable solution that would give us exactly what we need. I started
>> thinking about building a custom JSON diff java tool, which is
>> certainly beyond the effort I hoped it would take.
>>
>> With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API
>> we could immediately improve the client diff story by avoiding
>> duplicate diff results. We would also be able to have something like
>> '--dry-run' option on a 'aurora update start' command listing the
>> affected instances and actions about to be taken.
>>
>> As for the actual TaskConfig diff details, this seems like a natural
>> improvement we could add later when/if we find a better JSON diffing
>> tool or dare to implement our own.
>>
>> Any thoughts/objections to the above?
>>
>> On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <[email protected]
>> <javascript:;>> wrote:
>> > I'd like to propose an alternative - that we start off by having an API
>> > endpoint which simply returns the JobUpdateInstructions that describes
>> the
>> > changes that would happen if a given JobUpdateRequest was applied.
>> >
>> > There is a lot of value in having clients ask the scheduler to tell them
>> > what is going to happen rather than try and duplicate that diff algorithm
>> > in both places. Of course this does not solve the "pretty" diff command
>> > when trying to *display* the differences, but it does remove a lot of the
>> > work in implementing the diff command and I think for the different types
>> > of clients (UI/CLI/etc.) displaying the diffs is going to be a much
>> bigger
>> > problem anyway.
>> >
>> > In the future if the scheduler does calculate and return some
>> > representation of the diff, I'd argue that it should be stored in
>> > JobUpdateInstructions anyway, so we can always add that functionality
>> later
>> > and keep the APIs the same.
>> >
>> > Thanks,
>> > David
>> >
>> >
>> > On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <[email protected]
>> <javascript:;>> wrote:
>> >
>> >> I think it's fine to provide an 'enhanced' experience when the format is
>> >> JSON, but i don't think we should force that.
>> >>
>> >> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <[email protected]
>> <javascript:;>>
>> >> wrote:
>> >>
>> >> > The benefit of assuming a certain format is the richer experience we
>> >> > can give to our users. The *blob* may be too large to make any sense
>> >> > of it during diffing. I don't propose to enforce any schema though,
>> >> > that would be too restrictive. I do however believe assuming JSON
>> >> > format would be an acceptable tradeoff.
>> >> >
>> >> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
>> >> > and disabling JSON diffs for any executor types that don't follow the
>> >> > guidance. This will result in a degraded user experience but may be
>> >> > the middle ground here. Thoughts?
>> >> >
>> >> > On Tue, Sep 15, 2015 at 11:43 AM,  <[email protected]>
>> wrote:
>> >> > > Isn't this data supposed to be any blob that transparently passes
>> in to
>> >> > the executor through mesos data blob. Why would we want to impose any
>> >> sort
>> >> > of format? It could be a binary blob. Executor writers should be able
>> to
>> >> > move between different schedulers/frameworks without any rework
>> ideally.
>> >> > This field seems like more like garbage in and garbage out and only
>> >> > understood by end client and the executor. Scheduler may stay out of
>> it.
>> >> > > If you compute hash and indicate same or different data between 2
>> job
>> >> > update diff, that may be reasonable.
>> >> > >
>> >> > > My 2 cents.
>> >> > >
>> >> > > Thx
>> >> > >
>> >> > > Sent from my iPhone
>> >> > >
>> >> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <[email protected]
>> <javascript:;>> wrote:
>> >> > >>
>> >> > >> I'm a proponent of firming up our executor <-> scheduler contract.
>> >> > Since we
>> >> > >> are going to get multiple executor support soon I think it would be
>> >> > nice if
>> >> > >> we said that ExecutorConfig.data was JSON.
>> >> > >>
>> >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <
>> [email protected] <javascript:;>
>> >> >
>> >> > >> wrote:
>> >> > >>
>> >> > >>> | I hope this doesn't mean we would be returning a textual
>> >> > >>> representation of a diff
>> >> > >>>
>> >> > >>> If we can make an assumption that executor data is always JSON, we
>> >> can
>> >> > >>> deliver a much more specific answer by applying JSON diff tools.
>> >> > >>> Something like:
>> >> > >>>
>> >> > >>> - "environment": "prod"
>> >> > >>> + "environment": "test"
>> >> > >>>
>> >> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
>> >> blob
>> >> > >>> content for both left and right sides and let users figure out the
>> >> > >>> problem. I don't think that's acceptable.
>> >> > >>>
>> >> > >>> Does it make sense? Any suggestions on the output format of the
>> diff?
>> >> > >>> I think it should be structured but at the same time we have to
>> get
>> >> > >>> down to text level at some point to report concrete discrepancies.
>> >> > >>>
>> >> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <[email protected]
>> <javascript:;>>
>> >> > wrote:
>> >> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
>> >> > support
>> >> > >>>> alternative executors.  I'd hate for that to go away.
>> >> > >>>>
>> >> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <
>> [email protected] <javascript:;>
>> >> >
>> >> > >>>> wrote:
>> >> > >>>>
>> >> > >>>>> This is one of the hoops encountered when using the Thrift api
>> >> > directly
>> >> > >>> and
>> >> > >>>>> not using the client, I'd love to see ExecutorConfig.data move
>> to a
>> >> > >>> thrift
>> >> > >>>>> object and not be a string blob
>> >> > >>>>>
>> >> > >>>>> -Jake
>> >> > >>>>>
>> >> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <
>> [email protected] <javascript:;>>
>> >> > >>> wrote:
>> >> > >>>>>
>> >> > >>>>>> I like the idea of adding this API, but i don't see why it
>> >> requires
>> >> > >>> us to
>> >> > >>>>>> make assumptions about ExecutorConfig.data.  I hope this
>> doesn't
>> >> > mean
>> >> > >>> we
>> >> > >>>>>> would be returning a textual representation of a diff.  Can you
>> >> > >>> elaborate
>> >> > >>>>>> on that?
>> >> > >>>>>>
>> >> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
>> >> > [email protected] <javascript:;>>
>> >> > >>>>>> wrote:
>> >> > >>>>>>
>> >> > >>>>>>> Problem:
>> >> > >>>>>>> We currently don't have a good user experience around "aurora
>> job
>> >> > >>>>>>> diff" command. The task configs are dumped as "prettified"
>> JSON
>> >> > >>>>>>> strings and diffed with the system diff tool. Anyone who
>> tried to
>> >> > >>> use
>> >> > >>>>>>> it knows it can be very hard to read especially when it comes
>> to
>> >> > >>>>>>> executor data deltas. Also, the implementation is done
>> completely
>> >> > >>>>>>> within the Aurora client making it hard to reuse this feature
>> by
>> >> > >>> other
>> >> > >>>>>>> clients (e.g.: an external deploy coordination tool).
>> >> > >>>>>>>
>> >> > >>>>>>> Proposal:
>> >> > >>>>>>> Move the diff logic to the scheduler and expose it via a new
>> >> > >>>>>>> jobConfigDiff thrift API.
>> >> > >>>>>>>
>> >> > >>>>>>> Benefits:
>> >> > >>>>>>> - Client will no longer have the custom non-reusable logic
>> moving
>> >> > us
>> >> > >>>>>>> closer towards a "thin client" goal.
>> >> > >>>>>>> - The new RPC can be fully used by any existing or new API
>> >> clients.
>> >> > >>>>>>> - The diff output will be improved via leveraging third party
>> >> POJO
>> >> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>> >> > >>>>>>> - The server updater can be partially/fully unified with the
>> new
>> >> > >>> diff
>> >> > >>>>>>> logic further improving the overall DRY-ness.
>> >> > >>>>>>>
>> >> > >>>>>>> Concerns:
>> >> > >>>>>>> - The executor data is currently treated as an opaque string
>> blob
>> >> > on
>> >> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be
>> >> JSON.
>> >> > >>> In
>> >> > >>>>>>> order to deliver the best UX, the scheduler would have to
>> start
>> >> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>> >> > >>>>>>>
>> >> > >>>>>>> Any other concerns/objections/comments? I would like to
>> formalize
>> >> > >>> the
>> >> > >>>>>>> proposal be EOW if we reach consensus quickly.
>> >> > >>>>>>>
>> >> > >>>>>>> Thanks,
>> >> > >>>>>>> Maxim
>> >> > >>>>>>>
>> >> > >>>>>>> [1] -
>> >> > >>>
>> >> >
>> >>
>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>> >> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
>> >> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>> >> > >>>
>> >> > >>> --
>> >> > >>> Zameer Manji
>> >> > >>>
>> >> > >>>
>> >> >
>> >>
>>

Re: JobConfig diff API

Reply via email to