On Wed, Nov 19, 2014 at 10:35 AM, Joshua M. Clulow <[email protected]> wrote:
> On 19 November 2014 09:13, Francois Billard <[email protected]> > wrote: > > we print the standardized column name in 'zfs_do_list' function : > > static char default_fields[] = > "name,used,available,referenced,mountpoint"; > > the name of properties MUST not ever change, else the code that will > > use them will break every time. > > I agree, and this is what I was attempting to convey: that they be the > standard, lowercase names as provided to "-o". Sorry for the > confusion. > > > Your suggestion about parseable values and human readable values are > > already reflected (zfs natural way) : > > > > with human readable values : > > > >> zfs list -J -o used | python -m json.tool > > { > > "cmd": "zfs list -J -o used", > > "stdout": [ > > { > > "used": "55K" > > }, > > { > > "used": "56,5K" > > } > > ] > > } > > > > and with bytes values (-p option) : > > > >> zfs list -pJ -o used | python -m json.tool > > { > > "cmd": "zfs list -pJ -o used", > > "stdout": [ > > { > > "used": "56320" > > }, > > { > > "used": "57856" > > } > > ] > > } > > So, I actually think that "-J" should _imply_ (i.e. force) "-p". It > does not make sense to provide non-parsable values in a > machine-readable format, especially if we are aiming for a strict, > well-documented schema for the resultant output that we commit to > supporting over time. > > > Concerning the streaming manner (a JSON objects on each line) : if you > > do that, you will not have JSON output, but a bloc of text containing > > several json object and you will have to parse it with regexp to load > > each json object : very complicated. > > No, this is absolutely not true. The format I'm referring to is often > described as LDJSON or "Line Delimited JSON"[1], a kind of JSON > streaming format[2]. Critically, no newline characters (the byte > 0x0A) appear anywhere within a JSON record -- only _between_ records. > This makes it trivial to read and parse in basically any modern > environment: > > - In C, use getline(3C) to read lines from a FILE * and then pass each > one into a JSON parsing library > > - In node.js, use the "lstream" module to read one line at a time and > JSON.parse() > > - In shell, use a sed(1)-like utility that understands line-delimited > JSON, like json[3] or jq[4]; these make it trivial to manipulate > each JSON object into some filtered or transformed version as part > of a shell pipeline > > - Other environments such as Python, Ruby and Java all have similar > library routines to read one line at a time from a file or other > input source; each line is then run through the JSON parser to > produce an object describing the current filesystem or other record > > [1] http://en.wikipedia.org/wiki/Line_Delimited_JSON > [2] http://en.wikipedia.org/wiki/JSON_Streaming > [3] https://github.com/trentm/json > [4] http://stedolan.github.io/jq > > > A well formed JSON object must have root element (as list, dict), > > which is easily loaded by code that will use the json output on server > > side (python, java,..) > > In contrast, each _line_ in an LDJSON stream is a well-formed JSON > object containing just the data pertaining to the current record. > This enables the consumer to work on one record at a time, if that is > what they require, or to collate incoming records into whatever > application-specific data structure makes sense to them. Of the > utmost importance, it requires neither zfs(1M) nor the application > consuming the stream to produce (and subsequently parse) all of the > data at one time. > I'm not sure I agree that it's of "utmost importance", but this does seem like it could be a nice performance enhancement over the existing interface. --matt > > This is akin to the difference between scandir(3C) and readdir(3C). > The former will load the entire directory into memory, sort it, then > return it in one result to the user. That's fine for small > directories, but for larger directories with millions of files it can > take a very long time, and consume a considerable amount of memory and > cycles in doing so. Using an interface like scandir(3C) has the > unfortunate result that processes with memory constraints (e.g. Java > with a fixed VM heap cap, or Node.js with its ~1.5GB heap limitation) > are unable to process directories beyond a certain size at all. In > contrast, a streaming interface like readdir(3C) allows the program to > read a few directories, do some processing, and then throw that > storage away. > > By using LDJSON for the output here, we are allowing for more flexible > usage of the tooling -- especially on large systems with thousands or > tens of thousands of filesystems, volumes or snapshots. I speak from > painful experience dealing with processing large JSON datasets from > order 50MB up to a couple of gigabytes, often in programming > environments that simply cannot parse and store the entire object tree > in memory. > > > Cheers. > > -- > Joshua M. Clulow > UNIX Admin/Developer > http://blog.sysmgr.org >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
