Re: std.data.json formal review

Walter Bright via Digitalmars-d Wed, 29 Jul 2015 21:46:46 -0700

On 7/29/2015 1:18 AM, Sönke Ludwig wrote:

Am 29.07.2015 um 00:29 schrieb Walter Bright:

1. Not sure that 'JSON' needs to be embedded in the public names.
'parseJSONStream' should just be 'parseStream', etc. Name
disambiguation, if needed, should be ably taken care of by a number of D
features for that purpose. Additionally, I presume that the stdx.data
package implies a number of different formats. These formats should all
use the same names with as similar as possible APIs - this won't work
too well if JSON is embedded in the APIs.


This is actually one of my pet peeves. Having a *readable* API that tells the
reader immediately what happens is IMO one of the most important aspects (far
more important than an API that allows quick typing). A number of times I've
seen D code that omits part of what it actually does in its name and the result
was that it was constantly necessary to scroll up to see where a particular name
might come from. So I have a strong preference to keep "JSON", because it's an
integral part of the semantics.

I agree with your goal of readability. And if someone wants to write code thatemphasizes it's JSON, they can write it as std.data.json.parseStream. (It's notabout saving typing, it's about avoiding extra redundant redundancy, I'm a bigfan of Strunk & White :-) ) This is not a huge deal for me, but I'm not in favorof establishing a new convention that repeats the module name. It eschews one ofthe advantages of having module name spaces in the first place, and evokes theold C style naming conventions.

2. JSON is a trivial format, http://json.org/. But I count 6 files and
30 names in the public API.


The whole thing provides a stream parser with high level helpers to make it
convenient to use, a DOM module, a separate lexer and a generator module that
operates in various different modes (maybe two additional modes still to come!).
Every single function provides real and frequently useful benefits. So if
anything, there are still some little things missing.

I understand there is a purpose to each of those things, but there's alsoconsiderable value in a simpler API.

All in all, even if JSON may be a simple format, the source code is already
almost 5k LOC (includes unit tests of course).


I don't count unit tests as LOC :-)

But apart from maintainability
they have mainly been separated to minimize the amount of code that needs to be
dragged in for a particular functionality (not only other JSON modules, but also
from different parts of Phobos).

They are so strongly related I don't see this as a big issue. Also, if they aretemplates, they don't get compiled in if not used.

3. Stepping back a bit, when I think of parsing JSON data, I think:

     auto ast = inputrange.toJSON();

where toJSON() accepts an input range and produces a container, the ast.
The ast is just a JSON value. Then, I can just query the ast to see what
kind of value it is (using overloading), and walk it as necessary.


We can drop the "Value" part of the name of course, if we expect that function
to be used a lot, but there is still the parseJSONStream function which is
arguably not less important. BTW, you just mentioned the DOM part so far, but
for any code that where performance is a priority, the stream based pull parser
is basically the way to go. This would also be the natural entry point for any
serialization library.

Agreed elsewhere. But still, I am not seeing a range interface on the functions.The lexer, for example, does not accept an input range of characters.

Having a range interface is absolutely critical, and is the thing I am the mostadamant about with all new Phobos additions. Any function that acceptsarbitrarily long data should accept an input range instead, any function thatgenerates arbitrary data should present that as an input range.

Any function that builds a container should accept an input range to fill thatcontainer with. Any function that builds a container should also be an output range.

And my prediction is, if we do it right, that working with JSON will in most
cases simply mean "S s = deserializeJSON(json_input);", where S is a D struct
that gets populated with the deserialized JSON data.


json_input must be a input range of characters.

Where that doesn't fit,
performance oriented code would use the pull parser.

I am not sure what you mean by 'pull parser'. Do you mean the parser presents aninput range as its output, and incrementally parses only as the next value isrequested?

So the DOM part of the
system, which is the only thing the current JSON module has, will only be left
as a niche functionality.

That's ok. Is it normal practice to call the JSON data structure a DocumentObject Model?

To create output:

     auto r = ast.toChars();  // r is an InputRange of characters
     writeln(r);


Do we have an InputRange version of the various number-to-string conversions?


We do now, at least for integers. I plan to do ones for floating point.

It would be quite inconvenient to reinvent those (double, long, BigInt) in the 
JSON
package.

Right. It's been reinvented multiple times in Phobos, which is absurd. If you'rereinventing them in std.data.json, then we're doing something wrong again.

Of course, using to!string internally would be an option, but it would
obviously destroy all @nogc opportunities and performance benefits.


That's exactly why the range versions were done.

So, we'll need:
     toJSON
     toChars
     JSONException

The possible JSON values are:
     string
     number
     object (associative arrays)
     array
     true
     false
     null

Since these are D builtin types, they can actually be a simple union of
D builtin types.


The idea is to have JSONValue be a simple alias to Algebraic!(...), just that
there are currently still some workarounds for DMD < 2.067.0 on top, which means
that JSONValue is a struct that "alias this" inherits from Algebraic for the
time being. Those workarounds will be removed when the code is actually put into
Phobos.

But a simple union would obviously not be enough, it still needs a type tag of
some form and needs to provide a @safe interface on top of it.


Agreed.

Algebraic is the only thing that comes close right now,
but I'd really prefer to have a fully
statically typed version of Algebraic that uses an enum as the type tag instead
of working with delegates/typeinfo.


If Algebraic is not good enough for this, it is a failure and must be fixed.

There is a decision needed about whether toJSON() allocates data or
returns slices into its inputrange. This can be 'static if' tested by:
if inputrange can return immutable slices.


The test is currently "is(T == string) || is (T == immutable(ubyte)[])", but
slicing is done in those cases and the non-DOM parser interface is even @nogc as
long as exceptions are disabled.

With a range interface, you can test for 1) hasSlicing and 2) ifElementEncodingType is immutable.

Why is ubyte being accepted? The ECMA-404 spec sez: "Conforming JSON text is asequence of Unicode code points".

toChars() can take a compile
time argument to determine if it is 'pretty' or not.


As long as JSON DOM values are stored in a generic Algebraic (which is a huge
win in terms of interoperability!), toChars won't suffice as a name.


Why not?

It would have to be toJSON(Chars) (as it basically is now). I've gave the 
"pretty"
version a separate name simply because it's more convenient to use and pretty
printing will probably be by far the most frequently used option when converting
to a string.

So make pretty printing the default. In fact, I'm skeptical that a non-prettyprinted version is worth while. Note that an adapter algorithm can stripredundant whitespace.

Re: std.data.json formal review

Reply via email to