I deal with datasets that consists of many millions of double-floats.
In the past, I have handled storing them in binary form using
make-double-float, double-float-high-bits, etc.  Since this is one of
the annoying bottlenecks in my system, I decided to see what else I
could do.

If all I want to use are files containing a large array of
double-floats, I can get about a factor of 15 (going from 1 second to
.06 seconds to write a length 1,000,000 array, for example) by simply
using low-level IO functions --- I wrap fopen and fwrite, and pass the
address of the array.  This is a big win.

However, I have metadata.  What would be an even bigger win is if I
could used files that contained headers that were Lisp data (to be
read by the CL reader) but I could still get at the arrays themselves
using fread.  Is this possible?  I thought that make-fd-stream was my
friend, but if I fopen a file, using fileno to get a file-descriptor,
and then use make-fd-stream, I cannot interleave reads and writes at
the C level and the Lisp level.  Going in reverse, if I open file in
CL, I can the file descriptor, but I don't see how to get back to a
usable FILE *.  Is there anyway around this?  Otherwise, I'll have to
consider getting the speedup by putting the metadata and the binary
data in separate files, which is more trouble.

Note: On the "write" end, I of course can also get the speedup by
first writing the header, closing the file, and reopening in "append"
mode.  I don't see an easy to do this on the read end, although maybe
I could muck about with fstream somehow.

As always, I'm open to any ideas or suggestions.

Cheers,

rif


Reply via email to