Just to hype blosc a little more, see

http://www.blosc.org/blosc-in-depth.html

The main feature is that data is chunked so that the compressed chunk size
fits into L1 cache, and is then decompressed and used there.  There are a
few more buzzwords (multithreading, simd) in the link above. Worth
exploring where this might be useful in Julia.

Cheers,
  Kevin

On Tuesday, September 2, 2014, Tim Holy <[email protected]> wrote:

> HDF5/JLD does support compression:
>
> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-writing-data
>
> But it's not turned on by default. Matlab uses compression by default, and
> I've found it's a huge bottleneck in terms of performance
> (
> http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files-more-quickly).
> But perhaps there's a good middle ground. It would take someone
> doing a little experimentation to see what the compromises are.
>
> --Tim
>
> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote:
> > Now that the JLD format can handle DataFrame objects I would like to
> switch
> > from storing data sets in .RData format to .jld format.  Datasets stored
> in
> > .RData format are compressed after they are written.  The default
> > compression is gzip.  Bzip2 and xz compression are also available.  The
> > compression can make a substantial difference in the file size because
> the
> > data values are often highly repetitive.
> >
> > JLD is different in scope in that .jld files can be queried using
> external
> > programs like h5ls and the files can have new data added or existing data
> > edited or removed.  The .RData format is an archival format.  Once the
> file
> > is written it cannot be modified in place.
> >
> > Given these differences I can appreciate that JLD files are not
> compressed.
> >  Nevertheless I think it would be useful to adopt a convention in the JLD
> > module for accessing data from files with a .jld.xz or .jld.7z extension.
> >  It could be as simple as uncompressing the files in a temporary
> directory,
> > reading then removing, or it could be more sophisticated.  I notice that
> my
> > versions of libjulia.so on an Ubuntu 64-bit system are linked against
> both
> > libz.so and liblzma.so
> >
> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so
> > linux-vdso.so.1 =>  (0x00007fff5214f000)
> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000)
> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000)
> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > (0x00007f62929a8000)
> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8
> > (0x00007f629278c000)
> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> > (0x00007f6292488000)
> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6292272000)
> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f6291c89000)
> >
> >
> > AFAIK the user-level interface to gzip requires the GZip package.
> Unless I
> > have missed something (always a possibility) there is no user-level
> > interface to liblzma in Julia.  If the library is going to be linked
> > anyway, would it make sense to provide a user-level interface in Julia?
>
>

Reply via email to