Interesting; I can totally see how that's a problem (especially with Java implementations).
It seems like ALTO has a choice here between: a) using a standard format on the wire (json-patch) and encouraging / writing implementations that are more memory-efficient (perhaps supporting streaming), or b) defining its own wire format, and still needing implementation work to take place. I know which I'd choose; YMMV :) Cheers, On 29 Jul 2014, at 1:25 am, Wendy Roome <[email protected]> wrote: > Thanks for the info. I took a quick look at one of the Java > implementations of json patch, and it was pretty much what I expected. > > The good news: A library that supports JSON patch can provide incremental > update with relatively little additional work for the client or the > server. For clients, a patch(oldNode,patchText) method applies a json > patch to an existing json object. For servers, a diff(oldNode,newNode) > method calculates the json patch that represents the changes between two > versions. > > The bad news: To achieve that ease of use, the client & server MUST store > the ALTO data in objects that the JSON library provides, and access the > data via methods provided by the JSON library. That is, the client & > server MUST use the DOM [Document Object Model] dictated by the JSON data. > > So what's wrong with that? For small maps, nothing. For medium-sized maps, > it's probably okay. > > But for small maps, why bother with incremental update? Just download the > new version already. Ditto for medium-sized maps. For heavens sake, look > at the number of commercial web sites with a footprint of a megabyte or > more! > > Incremental update becomes vital for large maps. E.g., thousands of PIDs, > with cost maps that take hundreds of megabytes. And for those, alas, the > JSON data model breaks down. Example: I originally used a JSON java > library from json.org. Then I tried it on large cost maps. For a > fully-specified 1,000 PID cost map, the library did read the JSON and > create the necessary object tree, but it took almost minute. Then I tried > reading a fully-specified 5,000 PID cost map. The library failed. I told > the JVM to use four gigs of ram. It still failed. > > So I wrote my own JSON library, with an "on the fly" parser instead of the > DOM model. That is, the parser scanned the incoming json, but instead of > building it's own model of the data, it called sub-class methods for each > event (enter/leave dictionary or array, found new string/number value, > etc). The sub-class methods stored the data in form optimized for the > application. For cost maps, I converted the PID names to numeric indexes, > on the assumption that the PID names don't change that often, and stored > the costs in a numerically-indexed square matrix of single-precision > floats. > > It's hard to be sure with java, but my guess is that my optimized > representation takes about 105 megs of RAM, while the JSON data model (if > it succeeded) would have taken well over a gig. > > In any case, my library can create a 5,000 PID cost map as fast as it can > read the JSON text. > > So the problem I have with JSON patch is that the automatic, hassle-free > implementations may not scale up to the map sizes we would like to > support. I think that limitation is inherent in the requirement that > clients use a DOM dictated by the JSON library. > > - Wendy Roome > > > On 07/27/2014, 22:48, "Mark Nottingham" <[email protected]> wrote: > >> >> On 21 Jul 2014, at 2:17 pm, Y. Richard Yang <[email protected]> wrote: >> >>> I am not sure I fully understand the context of it will "just-work." >> >> Just that the PATCH method is defined for generic mechanisms, not >> application-specific ones; if your payload is application-specific, you >> might as well use POST. >> >> >>> Here are some issues in our application-specific context, as Wendy >>> pointed out: >>> >>> 1. Ease-of-use: is there an easy-to-use library that just works: it >>> produces and applies JSON Patch based on existing JSON libraries? Do you >>> have any recommended pointers that we may check out? >> >> We have a test suite at: >> https://github.com/json-patch/json-patch-tests >> >> One of the community members keeps a list of implementations at: >> http://jsonpatch.com >> >> >>> 2. The issue of Set: JSON does not have a concept of a Set (e.g., a set >>> of IP prefixes). Hence, one typically uses an array to represent what >>> actually is a set. In setting where patching a set is simple, e.g., >>> indicating the element to be deleted. But indicating the op using the >>> array is cumbersome: one has to remember the array index. >>> >>> 3. Batching a set of operations: moving a subset of elements in a set. >> >> Yes, I can see how these would be difficult -- but they are possible. >> >> Note that we are starting to collect issues for a possible second version >> of json-patch: >> https://github.com/json-patch/json-patch2 >> ... and I've noted your feedback at: >> https://github.com/json-patch/json-patch2/issues/8 >> Please feel free to expand upon your requirements in that issue (and >> thanks for the feedback!). >> >> One approach you could take would be to use json-patch for now, and then >> use json-patch2 (or whatever it ends up being called) when it ships; that >> way, you avoid defining an application-specific patch format. >> >> Cheers, >> >> >> -- >> Mark Nottingham https://www.mnot.net/ >> >> >> >> > > -- Mark Nottingham https://www.mnot.net/ _______________________________________________ alto mailing list [email protected] https://www.ietf.org/mailman/listinfo/alto
