On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote:
As I understand it, there is a gap between what you can
currently do
with std.json (and indeed vibed json) and what you can do with
stdx.data.json. And the capability falls short of what can be
done in
other standard libraries such as the ones for python.
So since we are going for a nuclear-power station included
approach,
does that not mean that we need to specify what this layer
should do,
and somebody should start to work on it?
One thing. which I consider the most important missing building
block, is Jacob's anticipated std.serialization module [1]*.
Skipping the data representation layer and going straight for a
statically typed access to the data is the way to go in a
language such as D, at least in most situations.
Thanks, Sonke. I appreciate your taking the time to reply, and I
hope I represented my understanding of things correctly. I think
often things get stuck in limbo because people don't know what's
most useful, so I do think a central list of "things that need to
be done" in D ecosystem might be nice, if it doesn't become
excessively structured and bureaucratic. (I ain't volunteering
to maintain it, as I can't commit to it).
Thing is there are different use cases. For example, I pull data
from Quandl - the metadata is standard and won't change in format
often; but the data for a particular series will. For example if
I pull volatility data that will have different fields to price
or economic data. And I don't know beforehand the total set of
possibilities. This must be quite a common use case, and indeed
I just hit another one recently with a poorly-documented internal
corporate database for securities.
Maybe it's fine to generate the static typing in response to
reading the data, but then it ought to be easy to do so
(ultimately). Because otherwise you hack something up in Python
because it's just easier, and that hack job becomes the basis for
something larger then you ever intended or wanted and it's never
worth rewriting given the other stuff you need.
But even if you prefer static typing generated on the fly (which
maybe becomes useful via introspection a la Alexandrescu talk),
sometimes one will prefer dynamic typing, and since it's easy to
do in a way that doesn't destroy the elegance and coherence of
the whole project, why not give people the option ? It seems to
me that Guido painted a target on Python by saying "it's fast
enough, and you are usually I/O etc bound", because the numerical
computing people have different needs. So BLAS and the like may
be part of that, but also having something like pandas - and the
ability to get data in and out of it - would be an important part
in making it easy and fun to use D for this purpose, and it's not
so hard to do so, just a fair bit of work. Not that it makes
sense to undergo a death march to duplicate python functionality,
but there are some things that are relatively easy that have a
high payoff - like John Colvin's pydmagic.
(The link here, which may not be so obvious, is that in a way
pandas is a kind of replacement for a spreadsheet, and being able
to just pull stuff in without minding your 'p's and 'q's to get a
quick result lends itself to the kind of iterative exploration
that makes spreadsheets still overused even today. And that's
the link to JSON and (de)-serialization).
Another part is a high level layer on top of the stream parser
that exists for a while (albeit with room for improvement), but
that I forgot to update the documentation for. I've now caught
up on that and it can be found under [2] - see the read[...]
and skip[...] functions.
Thank you for the link.
Do you, or anyone else, have further ideas for higher level
functionality, or any concrete examples in other standard
libraries?
Will think it through and try to come up with some simple
examples. Paging John Colvin and Russell Winder, too.
* Or any other suitable replacement, if that doesn't work out
for some reason. The vibe.data.serialization module to me is
not a suitable candidate as it stands, because it lacks some
features of Jacob's solution, such as proper handling of
(duplicate/interior) references. But it's a perfect fit for my
own class of problems, so I currently can't justify to put work
into this either.
Is it worth you or someone else trying to articulate well what it
does well that is missing from stdx.data.json?