Re: [alto] JSON Patch vs. custom representation for incremental updates

Y. Richard Yang Sun, 20 Jul 2014 21:18:07 -0700

Hi Mark,

Thanks a lot for the expert comment. Please see below.


On Sun, Jul 20, 2014 at 11:31 AM, Mark Nottingham <[email protected]> wrote:

> Hi,
>
> It’s difficult to say without knowing more of the specifics, but generally
> - if you’re already using JSON, or considering it for expressing your data
> model (realising that a data model is separable from its serialisation onto
> the wire), JSON Patch may be useful to you. But it sounds like you need to
> figure out whether you want to use JSON.
>
> The ALTO Protocol is already based on JSON.  Please see
https://datatracker.ietf.org/doc/draft-ietf-alto-protocol/

Note that the PATCH method isn’t specific to JSON; you can come up with
> other PATCH formats.


Good comment. We are evaluating how good a match the particular JSON Patch
format in your RFC to our need.


> However, the more application-specific your patch format is, the less
> likely that it’ll “just work."
>
>
I am not sure I fully understand the context of it will "just-work." Here
are some issues in our application-specific context, as Wendy pointed out:

1. Ease-of-use: is there an easy-to-use library that just works: it
produces and applies JSON Patch based on existing JSON libraries? Do you
have any recommended pointers that we may check out?

2. The issue of Set: JSON does not have a concept of a Set (e.g., a set of
IP prefixes). Hence, one typically uses an array to represent what actually
is a set. In setting where patching a set is simple, e.g., indicating the
element to be deleted. But indicating the op using the array is cumbersome:
one has to remember the array index.

3. Batching a set of operations: moving a subset of elements in a set.

Any comment or pointers?

Richard





> Cheers,
>
>
> On 18 Jul 2014, at 12:55 pm, Y. Richard Yang <[email protected]> wrote:
>
> > Hi Wendy,
> >
> > Always good comment. Please see below.
> >
> > On Wed, Jul 9, 2014 at 3:14 PM, Wendy Roome <[email protected]>
> wrote:
> > Here's why I think we need a representation for incremental updates
> that's
> > tailored to the ALTO data model, rather than using the general JSON Patch
> > representation.
> >
> > As I understand it, JSON is a standardized way for a computer to create a
> > serialized, machine-independent representation of a data structure, send
> > that serialization over a stream to another computer, and have the other
> > computer recreate that data structure. This is a simplification, of
> > course, but I believe that's the goal.
> >
> > JSON Patch is a standard way to represent the changes to a data
> structure,
> > ship them to another computer, and have a JSON Patch library on the other
> > computer automatically update the remote data structure, with little
> > additional work for either computer.
> >
> > That's a wonderful goal. Unfortunately that has three problems when we
> > apply it to ALTO: (1) JSON does not have data representations that
> > directly correspond to the ALTO data structures, so JSON cannot capture
> > the semantics of the ALTO data. (2) As a result, JSON Patch is an
> > inefficient representation of the legal changes. (3) For the clients who
> > need incremental update, that inefficiency is a deal breaker.
> >
> > Let's take the last first. What clients need incremental update? Clients
> > who keep full cost and network maps. But what clients would do that?
> After
> > all, clients care about costs between endpoints. Clients don't really
> care
> > about PIDs. PIDs are just an abstraction to make the space of endpoints
> > more manageable. For most ALTO clients, the Endpoint Cost Service (ECS)
> is
> > exactly what they want, and they'd much rather use that than go though
> the
> > hassle of downloading the maps, searching them, and keeping them
> > up-to-date.
> >
> > So why would a client use full maps? Because the client needs to lookup
> > costs very quickly, and cannot tolerate the delay of querying the ALTO
> > Server. For example, a P2P tracker must select, out of 5,000 peers, the
> 50
> > with the lowest cost to a given peer. And a tracker might do that 10
> times
> > a second.
> >
> > As for the second point, incremental update is only necessary for large
> > maps. If a map only has 25 PIDs, why bother? Just download a new version.
> > What do I mean by "large"? A Network Map with 5,000 PIDs, 250,000
> > prefixes, and up to 25,000,000 cost points.
> >
> > Yes, that seems huge. Will anyone ever build that large an ALTO server? I
> > don't know. But I think a lot of us remember when the ipv4 address space
> > seemed infinite. Or when a 100 meg disk was big.
> >
> > Now consider point 1: JSON does not do a good job of representing the
> ALTO
> > data. Take Cost Maps. A Cost Map is a square sparse matrix of numbers
> > indexed by strings. JSON has no such data structure, so in JSON we
> > represent that as a lookup table of lookup tables of costs. But that
> > consumes a lot more space than necessary. Furthermore, at least for most
> > cost metrics, the values are low precision (do you really think that a
> > routingcost of 49.99999 is any better than a cost of 50?), and the string
> > indexes -- the PID names -- don't change very often.
> >
> > So if a client needs to handle a 5,000 x 5,000 Cost Map, and lookup costs
> > in microseconds, the client convert the PID names to numbers from 0 to
> > N-1, so it can use a sparse numerically indexed array, and it stores the
> > costs single-precision floats, not double-precision, to save 100 megs of
> > RAM.
> >
> > The mismatch is even worse for Network Maps. A Network Map is a lookup
> > table from PID names to sets of prefixes. ALTO has lookup tables, but
> > doesn't have sets, so we represent the sets by arrays. But this confounds
> > JSON Patch, because order matters in arrays. Furthermore, the JSON
> > representation does not capture the semantics that a prefix can only be
> in
> > one PID. So if the server moves 1.2.3.4 from PID1 to PID2, JSON Patch
> > would need the following update commands:
> >
> >      add 1.2.3.4 at index 17 in the array for PID1
> >      delete index 6 from the array for PID2
> >
> > But if we know the real semantics of ALTO Network Maps, we can represent
> > that update as:
> >
> >      add 1.2.3.4 to PID1
> >
> > The delete from PID2 is implicit.
> >
> > Here's the bottom line: Clients who need incremental update will NOT
> store
> > data in a format that looks like JSON data model. Such a client will read
> > the JSON data, convert it in a totally different form, and then discard
> > the original JSON. If we use JSON Patch to represent deltas, a client
> > would NEVER be able to use a standard JSON library to automatically apply
> > the patches. Instead, the client would need custom code that understands
> > every possible JSON Patch update command, and figures out how to apply
> > them to the client's representation of the data. And the client may be
> > forced to use a suboptimal data structure to allow that (e.g., store
> > prefixes as arrays rather than sets).
> >
> > This does not simplify anything; it just makes more work for the client.
> >
> >
> > After reading your discussion, I have the following picture of workflow
> in mind:
> >
> > Original Data Structure at ALTO Server (DSS) => (transformation T1)
> >   JSON at Server (JSONS) ----> (transmission/encoding)
> >      JSON at Client (JSONC) => (transformation T2)
> >         Data structure at Client (DSC)
> >
> > Here are some points:
> >
> > 1. JSONS == JSONC, which can be defined as JSON.
> > 2. It is possible that DSS != JSON != DSC.
> > 3. Your key point is that DSC should be efficient (e.g., a trie), in
> memory and/or lookup.
> > 4. A related point is that T2 which implements 3 may need to be highly
> customized, and hence is unlikely to be provided by a standard JSON
> library, although many libraries provide auto conversion from JSON to a
> specific data type (e.g., Java).
> >
> > I like the arguments!
> >
> > Before solving the preceding efficiency problem. I want to first solve
> the automation problem. In other words, assume that we use JSON Patch. Is
> there a library that provides automatic generation at server and
> application at client of JSON Patch? I googled around and found the
> following:
> >
> http://stackoverflow.com/questions/7326532/delta-encoding-for-json-objects
> >
> > The preceding is not complete, and I can imagine other approaches. For
> example, I can define a wrapper data type, say Set', that wraps a generic
> type such as Set, and user can modify an instance of Set' using only a set
> of operations that Set' provides. Then, upon each invocation of a mutator
> on Set', the type can produce the JSON patch automatically, before
> delegating the real operation to Set. An issue of this approach, however,
> is how to produce the XPATH when an instance of Set's might be a field of a
> more complex data structure.
> >
> > Before we draw the conclusion that JSON Patch mostly will add more work,
> I still prefer that it is more rigorously "proven" that it is hard to
> develop a good library for JSON Patch. I took the liberty of cc'ing the
> co-authors of JSON Patch, hoping that they may provide additional pointers.
> >
> > Thanks!
> >
> > Richard
> >
> >
> >     - Wendy Roome
> >
> >
> > _______________________________________________
> > alto mailing list
> > [email protected]
> > https://www.ietf.org/mailman/listinfo/alto
>
> --
> Mark Nottingham   http://www.mnot.net/
>

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] JSON Patch vs. custom representation for incremental updates

Reply via email to