+1 for blosc.  It's quite a nice bit of work, and if I remember correctly,
from the user's perspective, it's use is transparent.

Cheers,
   Kevin


On Tue, Sep 2, 2014 at 8:52 AM, Jake Bolewski <[email protected]>
wrote:

> HDF5 supports pluggable compression schemes, so this seems like it should
> be handled within the hdf5 library.  The fastest seems to be blosc which is
> written by the PyTables author.  Although this is not shipped by default
> with HDF5, if we include it in the BinDeps builds for hdf5 it would be a
> nice compressed default format.
>
>
> On Tuesday, September 2, 2014 11:30:39 AM UTC-4, Douglas Bates wrote:
>>
>> Now that the JLD format can handle DataFrame objects I would like to
>> switch from storing data sets in .RData format to .jld format.  Datasets
>> stored in .RData format are compressed after they are written.  The default
>> compression is gzip.  Bzip2 and xz compression are also available.  The
>> compression can make a substantial difference in the file size because the
>> data values are often highly repetitive.
>>
>> JLD is different in scope in that .jld files can be queried using
>> external programs like h5ls and the files can have new data added or
>> existing data edited or removed.  The .RData format is an archival format.
>>  Once the file is written it cannot be modified in place.
>>
>> Given these differences I can appreciate that JLD files are not
>> compressed.  Nevertheless I think it would be useful to adopt a convention
>> in the JLD module for accessing data from files with a .jld.xz or .jld.7z
>> extension.  It could be as simple as uncompressing the files in a temporary
>> directory, reading then removing, or it could be more sophisticated.  I
>> notice that my versions of libjulia.so on an Ubuntu 64-bit system are
>> linked against both libz.so and liblzma.so
>>
>> $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so
>> linux-vdso.so.1 =>  (0x00007fff5214f000)
>>  libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000)
>> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
>>  libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000)
>>  libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>> (0x00007f62929a8000)
>> libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8
>> (0x00007f629278c000)
>>  libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>> (0x00007f6292488000)
>> libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6292272000)
>>  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
>> /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
>>  liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f6291c89000)
>>
>>
>> AFAIK the user-level interface to gzip requires the GZip package.  Unless
>> I have missed something (always a possibility) there is no user-level
>> interface to liblzma in Julia.  If the library is going to be linked
>> anyway, would it make sense to provide a user-level interface in Julia?
>>
>

Reply via email to