Re: [alto] JSON Patch vs. custom representation for incremental updates

Mark Nottingham Mon, 21 Jul 2014 08:07:17 -0700

Hi,

It’s difficult to say without knowing more of the specifics, but generally - if 
you’re already using JSON, or considering it for expressing your data model 
(realising that a data model is separable from its serialisation onto the 
wire), JSON Patch may be useful to you. But it sounds like you need to figure 
out whether you want to use JSON.


Note that the PATCH method isn’t specific to JSON; you can come up with other 
PATCH formats. However, the more application-specific your patch format is, the 
less likely that it’ll “just work."

Cheers,


On 18 Jul 2014, at 12:55 pm, Y. Richard Yang <[email protected]> wrote:

> Hi Wendy,
> 
> Always good comment. Please see below.
> 
> On Wed, Jul 9, 2014 at 3:14 PM, Wendy Roome <[email protected]> 
> wrote:
> Here's why I think we need a representation for incremental updates that's
> tailored to the ALTO data model, rather than using the general JSON Patch
> representation.
> 
> As I understand it, JSON is a standardized way for a computer to create a
> serialized, machine-independent representation of a data structure, send
> that serialization over a stream to another computer, and have the other
> computer recreate that data structure. This is a simplification, of
> course, but I believe that's the goal.
> 
> JSON Patch is a standard way to represent the changes to a data structure,
> ship them to another computer, and have a JSON Patch library on the other
> computer automatically update the remote data structure, with little
> additional work for either computer.
> 
> That's a wonderful goal. Unfortunately that has three problems when we
> apply it to ALTO: (1) JSON does not have data representations that
> directly correspond to the ALTO data structures, so JSON cannot capture
> the semantics of the ALTO data. (2) As a result, JSON Patch is an
> inefficient representation of the legal changes. (3) For the clients who
> need incremental update, that inefficiency is a deal breaker.
> 
> Let's take the last first. What clients need incremental update? Clients
> who keep full cost and network maps. But what clients would do that? After
> all, clients care about costs between endpoints. Clients don't really care
> about PIDs. PIDs are just an abstraction to make the space of endpoints
> more manageable. For most ALTO clients, the Endpoint Cost Service (ECS) is
> exactly what they want, and they'd much rather use that than go though the
> hassle of downloading the maps, searching them, and keeping them
> up-to-date.
> 
> So why would a client use full maps? Because the client needs to lookup
> costs very quickly, and cannot tolerate the delay of querying the ALTO
> Server. For example, a P2P tracker must select, out of 5,000 peers, the 50
> with the lowest cost to a given peer. And a tracker might do that 10 times
> a second.
> 
> As for the second point, incremental update is only necessary for large
> maps. If a map only has 25 PIDs, why bother? Just download a new version.
> What do I mean by "large"? A Network Map with 5,000 PIDs, 250,000
> prefixes, and up to 25,000,000 cost points.
> 
> Yes, that seems huge. Will anyone ever build that large an ALTO server? I
> don't know. But I think a lot of us remember when the ipv4 address space
> seemed infinite. Or when a 100 meg disk was big.
> 
> Now consider point 1: JSON does not do a good job of representing the ALTO
> data. Take Cost Maps. A Cost Map is a square sparse matrix of numbers
> indexed by strings. JSON has no such data structure, so in JSON we
> represent that as a lookup table of lookup tables of costs. But that
> consumes a lot more space than necessary. Furthermore, at least for most
> cost metrics, the values are low precision (do you really think that a
> routingcost of 49.99999 is any better than a cost of 50?), and the string
> indexes -- the PID names -- don't change very often.
> 
> So if a client needs to handle a 5,000 x 5,000 Cost Map, and lookup costs
> in microseconds, the client convert the PID names to numbers from 0 to
> N-1, so it can use a sparse numerically indexed array, and it stores the
> costs single-precision floats, not double-precision, to save 100 megs of
> RAM.
> 
> The mismatch is even worse for Network Maps. A Network Map is a lookup
> table from PID names to sets of prefixes. ALTO has lookup tables, but
> doesn't have sets, so we represent the sets by arrays. But this confounds
> JSON Patch, because order matters in arrays. Furthermore, the JSON
> representation does not capture the semantics that a prefix can only be in
> one PID. So if the server moves 1.2.3.4 from PID1 to PID2, JSON Patch
> would need the following update commands:
> 
>      add 1.2.3.4 at index 17 in the array for PID1
>      delete index 6 from the array for PID2
> 
> But if we know the real semantics of ALTO Network Maps, we can represent
> that update as:
> 
>      add 1.2.3.4 to PID1
> 
> The delete from PID2 is implicit.
> 
> Here's the bottom line: Clients who need incremental update will NOT store
> data in a format that looks like JSON data model. Such a client will read
> the JSON data, convert it in a totally different form, and then discard
> the original JSON. If we use JSON Patch to represent deltas, a client
> would NEVER be able to use a standard JSON library to automatically apply
> the patches. Instead, the client would need custom code that understands
> every possible JSON Patch update command, and figures out how to apply
> them to the client's representation of the data. And the client may be
> forced to use a suboptimal data structure to allow that (e.g., store
> prefixes as arrays rather than sets).
>  
> This does not simplify anything; it just makes more work for the client.
>  
> 
> After reading your discussion, I have the following picture of workflow in 
> mind:
> 
> Original Data Structure at ALTO Server (DSS) => (transformation T1) 
>   JSON at Server (JSONS) ----> (transmission/encoding) 
>      JSON at Client (JSONC) => (transformation T2)
>         Data structure at Client (DSC)
> 
> Here are some points:
> 
> 1. JSONS == JSONC, which can be defined as JSON. 
> 2. It is possible that DSS != JSON != DSC. 
> 3. Your key point is that DSC should be efficient (e.g., a trie), in memory 
> and/or lookup. 
> 4. A related point is that T2 which implements 3 may need to be highly 
> customized, and hence is unlikely to be provided by a standard JSON library, 
> although many libraries provide auto conversion from JSON to a specific data 
> type (e.g., Java).
> 
> I like the arguments!
> 
> Before solving the preceding efficiency problem. I want to first solve the 
> automation problem. In other words, assume that we use JSON Patch. Is there a 
> library that provides automatic generation at server and application at 
> client of JSON Patch? I googled around and found the following:
> http://stackoverflow.com/questions/7326532/delta-encoding-for-json-objects
> 
> The preceding is not complete, and I can imagine other approaches. For 
> example, I can define a wrapper data type, say Set', that wraps a generic 
> type such as Set, and user can modify an instance of Set' using only a set of 
> operations that Set' provides. Then, upon each invocation of a mutator on 
> Set', the type can produce the JSON patch automatically, before delegating 
> the real operation to Set. An issue of this approach, however, is how to 
> produce the XPATH when an instance of Set's might be a field of a more 
> complex data structure.
> 
> Before we draw the conclusion that JSON Patch mostly will add more work, I 
> still prefer that it is more rigorously "proven" that it is hard to develop a 
> good library for JSON Patch. I took the liberty of cc'ing the co-authors of 
> JSON Patch, hoping that they may provide additional pointers.
> 
> Thanks!
> 
> Richard
>  
> 
>     - Wendy Roome
> 
> 
> _______________________________________________
> alto mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/alto

--
Mark Nottingham   http://www.mnot.net/



_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] JSON Patch vs. custom representation for incremental updates

Reply via email to