I deal with datasets that consists of many millions of double-floats. In the past, I have handled storing them in binary form using make-double-float, double-float-high-bits, etc. Since this is one of the annoying bottlenecks in my system, I decided to see what else I could do.
If all I want to use are files containing a large array of double-floats, I can get about a factor of 15 (going from 1 second to .06 seconds to write a length 1,000,000 array, for example) by simply using low-level IO functions --- I wrap fopen and fwrite, and pass the address of the array. This is a big win. However, I have metadata. What would be an even bigger win is if I could used files that contained headers that were Lisp data (to be read by the CL reader) but I could still get at the arrays themselves using fread. Is this possible? I thought that make-fd-stream was my friend, but if I fopen a file, using fileno to get a file-descriptor, and then use make-fd-stream, I cannot interleave reads and writes at the C level and the Lisp level. Going in reverse, if I open file in CL, I can the file descriptor, but I don't see how to get back to a usable FILE *. Is there anyway around this? Otherwise, I'll have to consider getting the speedup by putting the metadata and the binary data in separate files, which is more trouble. Note: On the "write" end, I of course can also get the speedup by first writing the header, closing the file, and reopening in "append" mode. I don't see an easy to do this on the read end, although maybe I could muck about with fstream somehow. As always, I'm open to any ideas or suggestions. Cheers, rif
