Interesting; I can totally see how that's a problem (especially with Java 
implementations).

It seems like ALTO has a choice here between:

a) using a standard format on the wire (json-patch) and encouraging / writing 
implementations that are more memory-efficient (perhaps supporting streaming), 
or

b) defining its own wire format, and still needing implementation work to take 
place.

I know which I'd choose; YMMV :)

Cheers,


On 29 Jul 2014, at 1:25 am, Wendy Roome <[email protected]> wrote:

> Thanks for the info. I took a quick look at one of the Java
> implementations of json patch, and it was pretty much what I expected.
> 
> The good news: A library that supports JSON patch can provide incremental
> update with relatively little additional work for the client or the
> server. For clients, a patch(oldNode,patchText) method applies a json
> patch to an existing json object. For servers, a diff(oldNode,newNode)
> method calculates the json patch that represents the changes between two
> versions.
> 
> The bad news: To achieve that ease of use, the client & server MUST store
> the ALTO data in objects that the JSON library provides, and access the
> data via methods provided by the JSON library. That is, the client &
> server MUST use the DOM [Document Object Model] dictated by the JSON data.
> 
> So what's wrong with that? For small maps, nothing. For medium-sized maps,
> it's probably okay.
> 
> But for small maps, why bother with incremental update? Just download the
> new version already. Ditto for medium-sized maps. For heavens sake, look
> at the number of commercial web sites with a footprint of a megabyte or
> more!
> 
> Incremental update becomes vital for large maps. E.g., thousands of PIDs,
> with cost maps that take hundreds of megabytes. And for those, alas, the
> JSON data model breaks down. Example: I originally used a JSON java
> library from json.org. Then I tried it on large cost maps. For a
> fully-specified 1,000 PID cost map, the library did read the JSON and
> create the necessary object tree, but it took almost minute. Then I tried
> reading a fully-specified 5,000 PID cost map. The library failed. I told
> the JVM to use four gigs of ram. It still failed.
> 
> So I wrote my own JSON library, with an "on the fly" parser instead of the
> DOM model. That is, the parser scanned the incoming json, but instead of
> building it's own model of the data, it called sub-class methods for each
> event (enter/leave dictionary or array, found new string/number value,
> etc). The sub-class methods stored the data in form optimized for the
> application. For cost maps, I converted the PID names to numeric indexes,
> on the assumption that the PID names don't change that often, and stored
> the costs in a numerically-indexed square matrix of single-precision
> floats.
> 
> It's hard to be sure with java, but my guess is that my optimized
> representation takes about 105 megs of RAM, while the JSON data model (if
> it succeeded) would have taken well over a gig.
> 
> In any case, my library can create a 5,000 PID cost map as fast as it can
> read the JSON text.
> 
> So the problem I have with JSON patch is that the automatic, hassle-free
> implementations may not scale up to the map sizes we would like to
> support. I think that limitation is inherent in the requirement that
> clients use a DOM dictated by the JSON library.
> 
>       - Wendy Roome
> 
> 
> On 07/27/2014, 22:48, "Mark Nottingham" <[email protected]> wrote:
> 
>> 
>> On 21 Jul 2014, at 2:17 pm, Y. Richard Yang <[email protected]> wrote:
>> 
>>> I am not sure I fully understand the context of it will "just-work."
>> 
>> Just that the PATCH method is defined for generic mechanisms, not
>> application-specific ones; if your payload is application-specific, you
>> might as well use POST.
>> 
>> 
>>> Here are some issues in our application-specific context, as Wendy
>>> pointed out:
>>> 
>>> 1. Ease-of-use: is there an easy-to-use library that just works: it
>>> produces and applies JSON Patch based on existing JSON libraries? Do you
>>> have any recommended pointers that we may check out?
>> 
>> We have a test suite at:
>> https://github.com/json-patch/json-patch-tests
>> 
>> One of the community members keeps a list of implementations at:
>> http://jsonpatch.com
>> 
>> 
>>> 2. The issue of Set: JSON does not have a concept of a Set (e.g., a set
>>> of IP prefixes). Hence, one typically uses an array to represent what
>>> actually is a set. In setting where patching a set is simple, e.g.,
>>> indicating the element to be deleted. But indicating the op using the
>>> array is cumbersome: one has to remember the array index.
>>> 
>>> 3. Batching a set of operations: moving a subset of elements in a set.
>> 
>> Yes, I can see how these would be difficult -- but they are possible.
>> 
>> Note that we are starting to collect issues for a possible second version
>> of json-patch:
>> https://github.com/json-patch/json-patch2
>> ... and I've noted your feedback at:
>> https://github.com/json-patch/json-patch2/issues/8
>> Please feel free to expand upon your requirements in that issue (and
>> thanks for the feedback!).
>> 
>> One approach you could take would be to use json-patch for now, and then
>> use json-patch2 (or whatever it ends up being called) when it ships; that
>> way, you avoid defining an application-specific patch format.
>> 
>> Cheers,
>> 
>> 
>> --
>> Mark Nottingham   https://www.mnot.net/
>> 
>> 
>> 
>> 
> 
> 

--
Mark Nottingham   https://www.mnot.net/

_______________________________________________
alto mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/alto

Reply via email to