+1 for blosc. It's quite a nice bit of work, and if I remember correctly, from the user's perspective, it's use is transparent.
Cheers, Kevin On Tue, Sep 2, 2014 at 8:52 AM, Jake Bolewski <[email protected]> wrote: > HDF5 supports pluggable compression schemes, so this seems like it should > be handled within the hdf5 library. The fastest seems to be blosc which is > written by the PyTables author. Although this is not shipped by default > with HDF5, if we include it in the BinDeps builds for hdf5 it would be a > nice compressed default format. > > > On Tuesday, September 2, 2014 11:30:39 AM UTC-4, Douglas Bates wrote: >> >> Now that the JLD format can handle DataFrame objects I would like to >> switch from storing data sets in .RData format to .jld format. Datasets >> stored in .RData format are compressed after they are written. The default >> compression is gzip. Bzip2 and xz compression are also available. The >> compression can make a substantial difference in the file size because the >> data values are often highly repetitive. >> >> JLD is different in scope in that .jld files can be queried using >> external programs like h5ls and the files can have new data added or >> existing data edited or removed. The .RData format is an archival format. >> Once the file is written it cannot be modified in place. >> >> Given these differences I can appreciate that JLD files are not >> compressed. Nevertheless I think it would be useful to adopt a convention >> in the JLD module for accessing data from files with a .jld.xz or .jld.7z >> extension. It could be as simple as uncompressing the files in a temporary >> directory, reading then removing, or it could be more sophisticated. I >> notice that my versions of libjulia.so on an Ubuntu 64-bit system are >> linked against both libz.so and liblzma.so >> >> $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so >> linux-vdso.so.1 => (0x00007fff5214f000) >> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000) >> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000) >> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000) >> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000) >> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 >> (0x00007f62929a8000) >> libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 >> (0x00007f629278c000) >> libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 >> (0x00007f6292488000) >> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6292272000) >> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000) >> /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000) >> liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f6291c89000) >> >> >> AFAIK the user-level interface to gzip requires the GZip package. Unless >> I have missed something (always a possibility) there is no user-level >> interface to liblzma in Julia. If the library is going to be linked >> anyway, would it make sense to provide a user-level interface in Julia? >> >
