On Tuesday, 23 June 2015 at 14:06:38 UTC, Sönke Ludwig wrote:
As I understand it, there is a gap between what you can currently do
with std.json (and indeed vibed json) and what you can do with
stdx.data.json. And the capability falls short of what can be done in
other standard libraries such as the ones for python.

So since we are going for a nuclear-power station included approach, does that not mean that we need to specify what this layer should do,
and somebody should start to work on it?

One thing. which I consider the most important missing building block, is Jacob's anticipated std.serialization module [1]*. Skipping the data representation layer and going straight for a statically typed access to the data is the way to go in a language such as D, at least in most situations.

Thanks, Sonke. I appreciate your taking the time to reply, and I hope I represented my understanding of things correctly. I think often things get stuck in limbo because people don't know what's most useful, so I do think a central list of "things that need to be done" in D ecosystem might be nice, if it doesn't become excessively structured and bureaucratic. (I ain't volunteering to maintain it, as I can't commit to it).

Thing is there are different use cases. For example, I pull data from Quandl - the metadata is standard and won't change in format often; but the data for a particular series will. For example if I pull volatility data that will have different fields to price or economic data. And I don't know beforehand the total set of possibilities. This must be quite a common use case, and indeed I just hit another one recently with a poorly-documented internal corporate database for securities.

Maybe it's fine to generate the static typing in response to reading the data, but then it ought to be easy to do so (ultimately). Because otherwise you hack something up in Python because it's just easier, and that hack job becomes the basis for something larger then you ever intended or wanted and it's never worth rewriting given the other stuff you need.

But even if you prefer static typing generated on the fly (which maybe becomes useful via introspection a la Alexandrescu talk), sometimes one will prefer dynamic typing, and since it's easy to do in a way that doesn't destroy the elegance and coherence of the whole project, why not give people the option ? It seems to me that Guido painted a target on Python by saying "it's fast enough, and you are usually I/O etc bound", because the numerical computing people have different needs. So BLAS and the like may be part of that, but also having something like pandas - and the ability to get data in and out of it - would be an important part in making it easy and fun to use D for this purpose, and it's not so hard to do so, just a fair bit of work. Not that it makes sense to undergo a death march to duplicate python functionality, but there are some things that are relatively easy that have a high payoff - like John Colvin's pydmagic.

(The link here, which may not be so obvious, is that in a way pandas is a kind of replacement for a spreadsheet, and being able to just pull stuff in without minding your 'p's and 'q's to get a quick result lends itself to the kind of iterative exploration that makes spreadsheets still overused even today. And that's the link to JSON and (de)-serialization).

Another part is a high level layer on top of the stream parser that exists for a while (albeit with room for improvement), but that I forgot to update the documentation for. I've now caught up on that and it can be found under [2] - see the read[...] and skip[...] functions.

Thank you for the link.

Do you, or anyone else, have further ideas for higher level functionality, or any concrete examples in other standard libraries?

Will think it through and try to come up with some simple examples. Paging John Colvin and Russell Winder, too.

* Or any other suitable replacement, if that doesn't work out for some reason. The vibe.data.serialization module to me is not a suitable candidate as it stands, because it lacks some features of Jacob's solution, such as proper handling of (duplicate/interior) references. But it's a perfect fit for my own class of problems, so I currently can't justify to put work into this either.

Is it worth you or someone else trying to articulate well what it does well that is missing from stdx.data.json?

Reply via email to