I think it's fine to provide an 'enhanced' experience when the format is JSON, but i don't think we should force that.
On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <[email protected]> wrote: > The benefit of assuming a certain format is the richer experience we > can give to our users. The *blob* may be too large to make any sense > of it during diffing. I don't propose to enforce any schema though, > that would be too restrictive. I do however believe assuming JSON > format would be an acceptable tradeoff. > > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs > and disabling JSON diffs for any executor types that don't follow the > guidance. This will result in a degraded user experience but may be > the middle ground here. Thoughts? > > On Tue, Sep 15, 2015 at 11:43 AM, <[email protected]> wrote: > > Isn't this data supposed to be any blob that transparently passes in to > the executor through mesos data blob. Why would we want to impose any sort > of format? It could be a binary blob. Executor writers should be able to > move between different schedulers/frameworks without any rework ideally. > This field seems like more like garbage in and garbage out and only > understood by end client and the executor. Scheduler may stay out of it. > > If you compute hash and indicate same or different data between 2 job > update diff, that may be reasonable. > > > > My 2 cents. > > > > Thx > > > > Sent from my iPhone > > > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <[email protected]> wrote: > >> > >> I'm a proponent of firming up our executor <-> scheduler contract. > Since we > >> are going to get multiple executor support soon I think it would be > nice if > >> we said that ExecutorConfig.data was JSON. > >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <[email protected]> > >> wrote: > >> > >>> | I hope this doesn't mean we would be returning a textual > >>> representation of a diff > >>> > >>> If we can make an assumption that executor data is always JSON, we can > >>> deliver a much more specific answer by applying JSON diff tools. > >>> Something like: > >>> > >>> - "environment": "prod" > >>> + "environment": "test" > >>> > >>> Otherwise, we would have to output the entire ExecutorConfig.data blob > >>> content for both left and right sides and let users figure out the > >>> problem. I don't think that's acceptable. > >>> > >>> Does it make sense? Any suggestions on the output format of the diff? > >>> I think it should be structured but at the same time we have to get > >>> down to text level at some point to report concrete discrepancies. > >>> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <[email protected]> > wrote: > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can > support > >>>> alternative executors. I'd hate for that to go away. > >>>> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <[email protected]> > >>>> wrote: > >>>> > >>>>> This is one of the hoops encountered when using the Thrift api > directly > >>> and > >>>>> not using the client, I'd love to see ExecutorConfig.data move to a > >>> thrift > >>>>> object and not be a string blob > >>>>> > >>>>> -Jake > >>>>> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <[email protected]> > >>> wrote: > >>>>> > >>>>>> I like the idea of adding this API, but i don't see why it requires > >>> us to > >>>>>> make assumptions about ExecutorConfig.data. I hope this doesn't > mean > >>> we > >>>>>> would be returning a textual representation of a diff. Can you > >>> elaborate > >>>>>> on that? > >>>>>> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko < > [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Problem: > >>>>>>> We currently don't have a good user experience around "aurora job > >>>>>>> diff" command. The task configs are dumped as "prettified" JSON > >>>>>>> strings and diffed with the system diff tool. Anyone who tried to > >>> use > >>>>>>> it knows it can be very hard to read especially when it comes to > >>>>>>> executor data deltas. Also, the implementation is done completely > >>>>>>> within the Aurora client making it hard to reuse this feature by > >>> other > >>>>>>> clients (e.g.: an external deploy coordination tool). > >>>>>>> > >>>>>>> Proposal: > >>>>>>> Move the diff logic to the scheduler and expose it via a new > >>>>>>> jobConfigDiff thrift API. > >>>>>>> > >>>>>>> Benefits: > >>>>>>> - Client will no longer have the custom non-reusable logic moving > us > >>>>>>> closer towards a "thin client" goal. > >>>>>>> - The new RPC can be fully used by any existing or new API clients. > >>>>>>> - The diff output will be improved via leveraging third party POJO > >>>>>>> and/or JSON diff libraries [1,2,3, etc.]. > >>>>>>> - The server updater can be partially/fully unified with the new > >>> diff > >>>>>>> logic further improving the overall DRY-ness. > >>>>>>> > >>>>>>> Concerns: > >>>>>>> - The executor data is currently treated as an opaque string blob > on > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON. > >>> In > >>>>>>> order to deliver the best UX, the scheduler would have to start > >>>>>>> requiring ExecutorConfig.data to be a valid JSON. > >>>>>>> > >>>>>>> Any other concerns/objections/comments? I would like to formalize > >>> the > >>>>>>> proposal be EOW if we reach consensus quickly. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Maxim > >>>>>>> > >>>>>>> [1] - > >>> > http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started > >>>>>>> [2] - http://javers.org/documentation/diff-examples/ > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert > >>> > >>> -- > >>> Zameer Manji > >>> > >>> >
