Re: [alto] JSON Patch vs. custom representation for incremental updates

Jan Seedorf Thu, 10 Jul 2014 07:27:14 -0700

Hi Wendy,

What about future, new ALTO services (e.g. as proposed in 
http://tools.ietf.org/html/draft-seedorf-cdni-request-routing-alto-07)?


I am not a fan of JSON patch, but a solution for incremental updates based on 
JSON patch should be much more future-proof with respect to new, future ALTO 
services that convey JSON objects other than network/cost maps, right?

 - Jan

> -----Original Message-----
> From: alto [mailto:[email protected]] On Behalf Of Wendy Roome
> Sent: Wednesday, July 09, 2014 9:15 PM
> To: IETF ALTO
> Subject: Re: [alto] JSON Patch vs. custom representation for incremental
> updates
> 
> Here's why I think we need a representation for incremental updates that's
> tailored to the ALTO data model, rather than using the general JSON Patch
> representation.
> 
> As I understand it, JSON is a standardized way for a computer to create a
> serialized, machine-independent representation of a data structure, send
> that serialization over a stream to another computer, and have the other
> computer recreate that data structure. This is a simplification, of
> course, but I believe that's the goal.
> 
> JSON Patch is a standard way to represent the changes to a data structure,
> ship them to another computer, and have a JSON Patch library on the other
> computer automatically update the remote data structure, with little
> additional work for either computer.
> 
> That's a wonderful goal. Unfortunately that has three problems when we
> apply it to ALTO: (1) JSON does not have data representations that
> directly correspond to the ALTO data structures, so JSON cannot capture
> the semantics of the ALTO data. (2) As a result, JSON Patch is an
> inefficient representation of the legal changes. (3) For the clients who
> need incremental update, that inefficiency is a deal breaker.
> 
> Let's take the last first. What clients need incremental update? Clients
> who keep full cost and network maps. But what clients would do that? After
> all, clients care about costs between endpoints. Clients don't really care
> about PIDs. PIDs are just an abstraction to make the space of endpoints
> more manageable. For most ALTO clients, the Endpoint Cost Service (ECS) is
> exactly what they want, and they'd much rather use that than go though the
> hassle of downloading the maps, searching them, and keeping them
> up-to-date.
> 
> So why would a client use full maps? Because the client needs to lookup
> costs very quickly, and cannot tolerate the delay of querying the ALTO
> Server. For example, a P2P tracker must select, out of 5,000 peers, the 50
> with the lowest cost to a given peer. And a tracker might do that 10 times
> a second.
> 
> As for the second point, incremental update is only necessary for large
> maps. If a map only has 25 PIDs, why bother? Just download a new version.
> What do I mean by "large"? A Network Map with 5,000 PIDs, 250,000
> prefixes, and up to 25,000,000 cost points.
> 
> Yes, that seems huge. Will anyone ever build that large an ALTO server? I
> don't know. But I think a lot of us remember when the ipv4 address space
> seemed infinite. Or when a 100 meg disk was big.
> 
> Now consider point 1: JSON does not do a good job of representing the ALTO
> data. Take Cost Maps. A Cost Map is a square sparse matrix of numbers
> indexed by strings. JSON has no such data structure, so in JSON we
> represent that as a lookup table of lookup tables of costs. But that
> consumes a lot more space than necessary. Furthermore, at least for most
> cost metrics, the values are low precision (do you really think that a
> routingcost of 49.99999 is any better than a cost of 50?), and the string
> indexes -- the PID names -- don't change very often.
> 
> So if a client needs to handle a 5,000 x 5,000 Cost Map, and lookup costs
> in microseconds, the client convert the PID names to numbers from 0 to
> N-1, so it can use a sparse numerically indexed array, and it stores the
> costs single-precision floats, not double-precision, to save 100 megs of
> RAM.
> 
> The mismatch is even worse for Network Maps. A Network Map is a lookup
> table from PID names to sets of prefixes. ALTO has lookup tables, but
> doesn't have sets, so we represent the sets by arrays. But this confounds
> JSON Patch, because order matters in arrays. Furthermore, the JSON
> representation does not capture the semantics that a prefix can only be in
> one PID. So if the server moves 1.2.3.4 from PID1 to PID2, JSON Patch
> would need the following update commands:
> 
>      add 1.2.3.4 at index 17 in the array for PID1
>      delete index 6 from the array for PID2
> 
> But if we know the real semantics of ALTO Network Maps, we can represent
> that update as:
> 
>      add 1.2.3.4 to PID1
> 
> The delete from PID2 is implicit.
> 
> Here's the bottom line: Clients who need incremental update will NOT store
> data in a format that looks like JSON data model. Such a client will read
> the JSON data, convert it in a totally different form, and then discard
> the original JSON. If we use JSON Patch to represent deltas, a client
> would NEVER be able to use a standard JSON library to automatically apply
> the patches. Instead, the client would need custom code that understands
> every possible JSON Patch update command, and figures out how to apply
> them to the client's representation of the data. And the client may be
> forced to use a suboptimal data structure to allow that (e.g., store
> prefixes as arrays rather than sets).
> 
> This does not simplify anything; it just makes more work for the client.
> 
>     - Wendy Roome
> 
> 
> _______________________________________________
> alto mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/alto

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] JSON Patch vs. custom representation for incremental updates

Reply via email to