On 1/2/18 1:01 PM, Christian Köstlin wrote:
On 02.01.18 15:09, Steven Schveighoffer wrote:
On 1/2/18 8:57 AM, Adam D. Ruppe wrote:
On Tuesday, 2 January 2018 at 11:22:06 UTC, Stefan Koch wrote:
You can make it much faster by using a sliced static array as buffer.
Only if you want data corruption! It keeps a copy of your pointer
internally: https://github.com/dlang/phobos/blob/master/std/zlib.d#L605
It also will always overallocate new buffers on each call
<https://github.com/dlang/phobos/blob/master/std/zlib.d#L602>
There is no efficient way to use it. The implementation is substandard
because the API limits the design.
iopipe handles this quite well. And deals with the buffers properly
(yes, it is very tricky. You have to ref-count the zstream structure,
because it keeps internal pointers to *itself* as well!). And no, iopipe
doesn't use std.zlib, I use the etc.zlib functions (but I poached some
ideas from std.zlib when writing it).
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/zip.d
I even wrote a json parser for iopipe. But it's far from complete. And
probably needs updating since I changed some of the iopipe API.
https://github.com/schveiguy/jsoniopipe
Depending on the use case, it might be enough, and should be very fast.
Thanks Steve for this proposal (actually I already had an iopipe version
on my harddisk that I applied to this problem) Its more or less your
unzip example + putting the data to an appender (I hope this is how it
should be done, to get the data to RAM).
Well, you don't need to use appender for that (and doing so is copying a
lot of the data an extra time). All you need is to extend the pipe until
there isn't any more new data, and it will all be in the buffer.
// almost the same line from your current version
auto mypipe = openDev("../out/nist/2011.json.gz")
.bufd.unzip(CompressionFormat.gzip);
// This line here will work with the current release (0.0.2):
while(mypipe.extend(0) != 0) {}
//But I have a fix for a bug that hasn't been released yet, this would
work if you use iopipe-master:
mypipe.ensureElems();
// getting the data is as simple as looking at the buffer.
auto data = mypipe.window; // ubyte[] of the data
iopipe is already better than the normal dlang version, almost like
java, but still far from the solution. I updated
https://github.com/gizmomogwai/benchmarks/tree/master/gunzip
I will give the direct gunzip calls a try ...
In terms of json parsing, I had really nice results with the fast.json
pull parser, but its comparing a little bit apples with oranges, because
I did not pull out all the data there.
Yeah, with jsoniopipe being very raw, I wouldn't be sure it was usable
in your case. The end goal is to have something fast, but very easy to
construct. I wasn't planning on focusing on the speed (yet) like other
libraries do, but ease of writing code to use it.
-Steve