Re: std.data.json formal review

Sönke Ludwig via Digitalmars-d Wed, 12 Aug 2015 01:27:13 -0700

Am 12.08.2015 um 00:21 schrieb deadalnix:

On Tuesday, 11 August 2015 at 21:06:24 UTC, Sönke Ludwig wrote:

See
http://s-ludwig.github.io/std_data_json/stdx/data/json/value/JSONValue.payload.html



The question whether each field is "really" needed obviously depends
on the application. However, the biggest type is BigInt that, form a
quick look, contains a dynamic array + a bool field, so it's not as
compact as it could be, but also not really large. There is also an
additional Location field that may sometimes be important for good
error messages and the like and sometimes may be totally unneeded.


Urg. Looks like BigInt should steal a bit somewhere instead of having a
bool like this. That is not really your lib's fault, but that's quite an
heavy cost.

Consider this, if the struct fit into 2 registers, it will be passed
around as such rather than in memory. That is a significant difference.
For BigInt itself, and, by proxy, for the JSON library.

Agreed, this was what I also thought. Considering that BigInt is heavyanyway, Dimitry's suggestion to store a "BigInt*" sounds like a goodidea to sidestep that issue, though.

Putting the BigInt thing aside, it seems like the biggest field in there
is an array of JSONValues or a string. For the string, you can
artificially limit the length by 3 bits to stick a tag. That still give
absurdly large strings. For the JSONValue case, the alignment on the
pointer is such as you can steal 3 bits from there. Or as for string,
the length can be used.

It seems very realizable to me to have the JSONValue struct fit into 2
registers, granted the tag fit in 3 bits (8 different types).

I can help with that if you want to.

The question is mainly just, should we decide on a single way torepresent values (either speed, or features), or let the library userdecide by either making JSONValue a template, or by providing twoseparate structs optimized for each case.

In the latter case, we could really optimize on all fronts and forexample use custom containers that use less allocations and are morecache friendly than the built-in ones.

However, my goal when implementing this has never been to make the DOM
representation as efficient as possible. The simple reason is that a
DOM representation is inherently inefficient when compared to
operating on the structure using either the pull parser or using a
deserializer that directly converts into a static D type. IMO these
should be advertised instead of trying to milk a dead cow (in terms of
performance).


Indeed. Still, JSON nodes should be as lightweight as possible.

2/ As far as I can see, the element are discriminated using typeid. An
enum is preferable as the compiler would know values ahead of time and
optimize based on this. It also allow use of things like final switch.


Using a tagged union like structure is definitely what I'd like to
have, too. However, the main goal was to build the DOM type upon a
generic algebraic type instead of using a home-brew tagged union. The
reason is that it automatically makes different DOM types with a
similar structure interoperable (JSON/BSON/TOML/...).


That is a great point that I haven't considered. I'd go the other way
around about it: providing a compatible typeid based struct from the
enum tagged one for compatibility. It can even be alias this so the
transition is transparent.

The transformation is not bijective, so that'd be great to get the most
restrictive form (the enum) and fallback on the least restrictive one
(alias this) when wanted.

As long as the set of types is fixed, it would even be bijective.Anyway, I've just started to work on a generic variant of an enum basedalgebraic type that exploits as much static type information aspossible. If that works out (compiler bugs?), it would be a great thingto have in Phobos, so maybe it's worth to delay the JSON module for thatif necessary.

The optimization to store the type enum in the length field of dynamicarrays could also be built into the generic type.

Now Phobos unfortunately only has Algebraic, which not only doesn't
have a type enum, but is currently also really bad at keeping static
type information when forwarding function calls or operators. The only
options were basically to resort to Algebraic for now, but have
something that works, or to first implement an alternative algebraic
type and get it accepted into Phobos, which would delay the whole
process nearly indefinitely.


That's fine. Done is better than perfect. Still API changes tend to be
problematic, so we need to nail that part at least, and an enum with
fallback on typeid based solution seems like the best option.

Yeah, the transition is indeed problematic. Sadly the "alias this" ideawouldn't work for for that either, because operators and methods of theenum based algebraic type usually have different return types.

Or do you perhaps mean the JSON -> deserialize -> manipulate ->
serialize -> JSON approach? That definitely is not a "loser
strategy"*, but yes, it is limited to applications where you have a
partially fixed schema. However, arguably most applications fall into
that category.


Yes.

Just to state explicitly what I mean: This strategy has the mostefficient in-memory storage format and profits from all the static typechecking niceties of the compiler. It also means that there is adocumented schema in the code that be used for reference by thedevelopers and that will automatically be verified by the serializer,resulting in less and better checked code. So where applicable I claimthat this is the best strategy to work with such data.

For maximum efficiency, it can also be transparently combined with thepull parser. The pull parser can for example be used to jump betweenarray entries and the serializer then reads each single array entry.

Re: std.data.json formal review

Reply via email to