Re: [alto] JSON Patch vs. custom representation for incremental updates

Wendy Roome Wed, 09 Jul 2014 12:15:27 -0700

Here's why I think we need a representation for incremental updates that's
tailored to the ALTO data model, rather than using the general JSON Patch
representation.


As I understand it, JSON is a standardized way for a computer to create a
serialized, machine-independent representation of a data structure, send
that serialization over a stream to another computer, and have the other
computer recreate that data structure. This is a simplification, of
course, but I believe that's the goal.

JSON Patch is a standard way to represent the changes to a data structure,
ship them to another computer, and have a JSON Patch library on the other
computer automatically update the remote data structure, with little
additional work for either computer.

That's a wonderful goal. Unfortunately that has three problems when we
apply it to ALTO: (1) JSON does not have data representations that
directly correspond to the ALTO data structures, so JSON cannot capture
the semantics of the ALTO data. (2) As a result, JSON Patch is an
inefficient representation of the legal changes. (3) For the clients who
need incremental update, that inefficiency is a deal breaker.

Let's take the last first. What clients need incremental update? Clients
who keep full cost and network maps. But what clients would do that? After
all, clients care about costs between endpoints. Clients don't really care
about PIDs. PIDs are just an abstraction to make the space of endpoints
more manageable. For most ALTO clients, the Endpoint Cost Service (ECS) is
exactly what they want, and they'd much rather use that than go though the
hassle of downloading the maps, searching them, and keeping them
up-to-date.

So why would a client use full maps? Because the client needs to lookup
costs very quickly, and cannot tolerate the delay of querying the ALTO
Server. For example, a P2P tracker must select, out of 5,000 peers, the 50
with the lowest cost to a given peer. And a tracker might do that 10 times
a second.

As for the second point, incremental update is only necessary for large
maps. If a map only has 25 PIDs, why bother? Just download a new version.
What do I mean by "large"? A Network Map with 5,000 PIDs, 250,000
prefixes, and up to 25,000,000 cost points.

Yes, that seems huge. Will anyone ever build that large an ALTO server? I
don't know. But I think a lot of us remember when the ipv4 address space
seemed infinite. Or when a 100 meg disk was big.

Now consider point 1: JSON does not do a good job of representing the ALTO
data. Take Cost Maps. A Cost Map is a square sparse matrix of numbers
indexed by strings. JSON has no such data structure, so in JSON we
represent that as a lookup table of lookup tables of costs. But that
consumes a lot more space than necessary. Furthermore, at least for most
cost metrics, the values are low precision (do you really think that a
routingcost of 49.99999 is any better than a cost of 50?), and the string
indexes -- the PID names -- don't change very often.

So if a client needs to handle a 5,000 x 5,000 Cost Map, and lookup costs
in microseconds, the client convert the PID names to numbers from 0 to
N-1, so it can use a sparse numerically indexed array, and it stores the
costs single-precision floats, not double-precision, to save 100 megs of
RAM.

The mismatch is even worse for Network Maps. A Network Map is a lookup
table from PID names to sets of prefixes. ALTO has lookup tables, but
doesn't have sets, so we represent the sets by arrays. But this confounds
JSON Patch, because order matters in arrays. Furthermore, the JSON
representation does not capture the semantics that a prefix can only be in
one PID. So if the server moves 1.2.3.4 from PID1 to PID2, JSON Patch
would need the following update commands:

     add 1.2.3.4 at index 17 in the array for PID1
     delete index 6 from the array for PID2

But if we know the real semantics of ALTO Network Maps, we can represent
that update as:

     add 1.2.3.4 to PID1

The delete from PID2 is implicit.

Here's the bottom line: Clients who need incremental update will NOT store
data in a format that looks like JSON data model. Such a client will read
the JSON data, convert it in a totally different form, and then discard
the original JSON. If we use JSON Patch to represent deltas, a client
would NEVER be able to use a standard JSON library to automatically apply
the patches. Instead, the client would need custom code that understands
every possible JSON Patch update command, and figures out how to apply
them to the client's representation of the data. And the client may be
forced to use a suboptimal data structure to allow that (e.g., store
prefixes as arrays rather than sets).

This does not simplify anything; it just makes more work for the client.

    - Wendy Roome


_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Re: [alto] JSON Patch vs. custom representation for incremental updates

Reply via email to