All these testimonials do make it sound promising. Even three-fold compression 
is a pretty big deal.

One disadvantage to compression is that it makes mmap impossible. But, since 
HDF5 supports hyperslabs, that's not as big a deal as it would have been.

--Tim

On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote:
> I've used Blosc in the past with great success.  Oftentimes it is faster
> than the uncompressed version if IO is the bottleneck.  The compression
> ratios are not great but that is really not the point.
> 
> On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote:
> > That looks pretty sweet. It seems to avoid a lot of the pitfalls of
> > naively compressing data files while still getting the benefits. It would
> > be great to support that in JLD, maybe even turned on by default.
> > 
> > 
> > On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire <[email protected]
> > 
> > <javascript:>> wrote:
> >> Just to hype blosc a little more, see
> >> 
> >> http://www.blosc.org/blosc-in-depth.html
> >> 
> >> The main feature is that data is chunked so that the compressed chunk
> >> size fits into L1 cache, and is then decompressed and used there.  There
> >> are a few more buzzwords (multithreading, simd) in the link above. Worth
> >> exploring where this might be useful in Julia.
> >> 
> >> Cheers,
> >> 
> >>   Kevin
> >> 
> >> On Tuesday, September 2, 2014, Tim Holy <[email protected] <javascript:>>
> >> 
> >> wrote:
> >>> HDF5/JLD does support compression:
> >>> 
> >>> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w
> >>> riting-data
> >>> 
> >>> But it's not turned on by default. Matlab uses compression by default,
> >>> and
> >>> I've found it's a huge bottleneck in terms of performance
> >>> (
> >>> http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files
> >>> -more-quickly). But perhaps there's a good middle ground. It would take
> >>> someone
> >>> doing a little experimentation to see what the compromises are.
> >>> 
> >>> --Tim
> >>> 
> >>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote:
> >>> > Now that the JLD format can handle DataFrame objects I would like to
> >>> 
> >>> switch
> >>> 
> >>> > from storing data sets in .RData format to .jld format.  Datasets
> >>> 
> >>> stored in
> >>> 
> >>> > .RData format are compressed after they are written.  The default
> >>> > compression is gzip.  Bzip2 and xz compression are also available. 
> >>> > The
> >>> > compression can make a substantial difference in the file size because
> >>> 
> >>> the
> >>> 
> >>> > data values are often highly repetitive.
> >>> > 
> >>> > JLD is different in scope in that .jld files can be queried using
> >>> 
> >>> external
> >>> 
> >>> > programs like h5ls and the files can have new data added or existing
> >>> 
> >>> data
> >>> 
> >>> > edited or removed.  The .RData format is an archival format.  Once the
> >>> 
> >>> file
> >>> 
> >>> > is written it cannot be modified in place.
> >>> > 
> >>> > Given these differences I can appreciate that JLD files are not
> >>> 
> >>> compressed.
> >>> 
> >>> >  Nevertheless I think it would be useful to adopt a convention in the
> >>> 
> >>> JLD
> >>> 
> >>> > module for accessing data from files with a .jld.xz or .jld.7z
> >>> 
> >>> extension.
> >>> 
> >>> >  It could be as simple as uncompressing the files in a temporary
> >>> 
> >>> directory,
> >>> 
> >>> > reading then removing, or it could be more sophisticated.  I notice
> >>> 
> >>> that my
> >>> 
> >>> > versions of libjulia.so on an Ubuntu 64-bit system are linked against
> >>> 
> >>> both
> >>> 
> >>> > libz.so and liblzma.so
> >>> > 
> >>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so
> >>> > linux-vdso.so.1 =>  (0x00007fff5214f000)
> >>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000)
> >>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
> >>> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
> >>> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000)
> >>> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >>> > (0x00007f62929a8000)
> >>> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8
> >>> > (0x00007f629278c000)
> >>> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >>> > (0x00007f6292488000)
> >>> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> >>> 
> >>> (0x00007f6292272000)
> >>> 
> >>> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
> >>> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
> >>> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
> >>> > (0x00007f6291c89000)
> >>> > 
> >>> > 
> >>> > AFAIK the user-level interface to gzip requires the GZip package.
> >>> 
> >>> Unless I
> >>> 
> >>> > have missed something (always a possibility) there is no user-level
> >>> > interface to liblzma in Julia.  If the library is going to be linked
> >>> > anyway, would it make sense to provide a user-level interface in
> >>> > Julia?

Reply via email to