Re: JobConfig diff API

Bill Farner Wed, 16 Sep 2015 10:25:06 -0700

I think it's fine to provide an 'enhanced' experience when the format is
JSON, but i don't think we should force that.


On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <[email protected]>
wrote:

> The benefit of assuming a certain format is the richer experience we
> can give to our users. The *blob* may be too large to make any sense
> of it during diffing. I don't propose to enforce any schema though,
> that would be too restrictive. I do however believe assuming JSON
> format would be an acceptable tradeoff.
>
> Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
> and disabling JSON diffs for any executor types that don't follow the
> guidance. This will result in a degraded user experience but may be
> the middle ground here. Thoughts?
>
> On Tue, Sep 15, 2015 at 11:43 AM,  <[email protected]> wrote:
> > Isn't this data supposed to be any blob that transparently passes in to
> the executor through mesos data blob. Why would we want to impose any sort
> of format? It could be a binary blob. Executor writers should be able to
> move between different schedulers/frameworks without any rework ideally.
> This field seems like more like garbage in and garbage out and only
> understood by end client and the executor. Scheduler may stay out of it.
> > If you compute hash and indicate same or different data between 2 job
> update diff, that may be reasonable.
> >
> > My 2 cents.
> >
> > Thx
> >
> > Sent from my iPhone
> >
> >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <[email protected]> wrote:
> >>
> >> I'm a proponent of firming up our executor <-> scheduler contract.
> Since we
> >> are going to get multiple executor support soon I think it would be
> nice if
> >> we said that ExecutorConfig.data was JSON.
> >>
> >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <[email protected]>
> >> wrote:
> >>
> >>> | I hope this doesn't mean we would be returning a textual
> >>> representation of a diff
> >>>
> >>> If we can make an assumption that executor data is always JSON, we can
> >>> deliver a much more specific answer by applying JSON diff tools.
> >>> Something like:
> >>>
> >>> - "environment": "prod"
> >>> + "environment": "test"
> >>>
> >>> Otherwise, we would have to output the entire ExecutorConfig.data blob
> >>> content for both left and right sides and let users figure out the
> >>> problem. I don't think that's acceptable.
> >>>
> >>> Does it make sense? Any suggestions on the output format of the diff?
> >>> I think it should be structured but at the same time we have to get
> >>> down to text level at some point to report concrete discrepancies.
> >>>
> >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <[email protected]>
> wrote:
> >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
> support
> >>>> alternative executors.  I'd hate for that to go away.
> >>>>
> >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> This is one of the hoops encountered when using the Thrift api
> directly
> >>> and
> >>>>> not using the client, I'd love to see ExecutorConfig.data move to a
> >>> thrift
> >>>>> object and not be a string blob
> >>>>>
> >>>>> -Jake
> >>>>>
> >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> I like the idea of adding this API, but i don't see why it requires
> >>> us to
> >>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't
> mean
> >>> we
> >>>>>> would be returning a textual representation of a diff.  Can you
> >>> elaborate
> >>>>>> on that?
> >>>>>>
> >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Problem:
> >>>>>>> We currently don't have a good user experience around "aurora job
> >>>>>>> diff" command. The task configs are dumped as "prettified" JSON
> >>>>>>> strings and diffed with the system diff tool. Anyone who tried to
> >>> use
> >>>>>>> it knows it can be very hard to read especially when it comes to
> >>>>>>> executor data deltas. Also, the implementation is done completely
> >>>>>>> within the Aurora client making it hard to reuse this feature by
> >>> other
> >>>>>>> clients (e.g.: an external deploy coordination tool).
> >>>>>>>
> >>>>>>> Proposal:
> >>>>>>> Move the diff logic to the scheduler and expose it via a new
> >>>>>>> jobConfigDiff thrift API.
> >>>>>>>
> >>>>>>> Benefits:
> >>>>>>> - Client will no longer have the custom non-reusable logic moving
> us
> >>>>>>> closer towards a "thin client" goal.
> >>>>>>> - The new RPC can be fully used by any existing or new API clients.
> >>>>>>> - The diff output will be improved via leveraging third party POJO
> >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
> >>>>>>> - The server updater can be partially/fully unified with the new
> >>> diff
> >>>>>>> logic further improving the overall DRY-ness.
> >>>>>>>
> >>>>>>> Concerns:
> >>>>>>> - The executor data is currently treated as an opaque string blob
> on
> >>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON.
> >>> In
> >>>>>>> order to deliver the best UX, the scheduler would have to start
> >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
> >>>>>>>
> >>>>>>> Any other concerns/objections/comments? I would like to formalize
> >>> the
> >>>>>>> proposal be EOW if we reach consensus quickly.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Maxim
> >>>>>>>
> >>>>>>> [1] -
> >>>
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> >>>>>>> [2] - http://javers.org/documentation/diff-examples/
> >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
> >>>
> >>> --
> >>> Zameer Manji
> >>>
> >>>
>

Re: JobConfig diff API

Reply via email to