I was a strong proponent of NDJ at one point, but I've grown less strident and more weary since then.
Brad Baxter has a good overview of some options[1]. I'm assuming it's a given we'd all prefer to work with valid JSON files if the pain-point can be brought down far enough. A couple years have passed since we first talked about this stuff, and the state of JSON pull-parsers is better than it once was: * yajl[2] is a super-fast C library for parsing json and support stream parsing. Bindings for ruby, node, python, and perl are linked to off the home page. I found one PHP binding[3] on github which is broken/abandoned, and no other pull-parser for PHP that I can find. Sadly, the ruby wrapper doesn't actually expose the callbacks necessary for pull-parsing, although there is a pull request[4] and at least one other option[5]. * Perl's JSON::XS supports incremental parsing * the Jackson java library[6] is excellent and has an easy-to-use pull-parser. There are a few simplistic efforts to wrap it for jruby/jython use as well. Pull-parsing is ugly, but no longer astoundingly difficult or slow, with the possible exception of PHP. And output is simple enough. As much as it makes me shudder, I think we're probably better off trying to do pull parsers and have a marc-in-json document be a valid JSON array. We could easily adopt a *convention* of, essentially, one-record-per-line, but wrap it in '[]' to make it valid json. That would allow folks with a pull-parser to write a real streaming reader, and folks without to "cheat" (ditch the leading and trailing [], and read the rest as one-record-per-line) until such a time as they can start using a more full-featured json parser. 1. http://en.wikipedia.org/wiki/User:Baxter.brad/Drafts/JSON_Document_Streaming_Proposal 2. http://lloyd.github.com/yajl/ 3. https://github.com/sfalvo/php-yajl 4. https://github.com/brianmario/yajl-ruby/pull/50 5. http://dgraham.github.com/json-stream/ 6. http://wiki.fasterxml.com/JacksonHome On Thu, Dec 1, 2011 at 12:56 PM, Michael B. Klein <mbkl...@gmail.com> wrote: > +1 to marc-in-json > +1 to newline-delimited records > +1 to read support > +1 to edsu, rsinger, BillDueber, gmcharlt, and the other module maintainers > > On Thu, Dec 1, 2011 at 9:31 AM, Keith Jenkins <k...@cornell.edu> wrote: > > > On Thu, Dec 1, 2011 at 11:56 AM, Gabriel Farrell <gsf...@gmail.com> > > wrote:> I suspect newline-delimited will win this race. > > Yes. Everyone please cast a vote for newline-delimited JSON. > > > > Is there any consensus on the appropriate mime type for ndj? > > > > Keith > > > -- Bill Dueber Library Systems Programmer University of Michigan Library