Certainly it would be more than welcome in HDF5. If there is call for a 
standalone implementation, that would be fine too.

Best,
--Tim

On Tuesday, September 02, 2014 12:58:24 PM Jake Bolewski wrote:
> It would be best to incorporate it into the HDF5 package.  A julia package
> would be useful if you wanted to do the same sort of compression on Julia
> binary blobs, such as serialized julia values in an IOBuffer.
> 
> On Tuesday, September 2, 2014 3:47:33 PM UTC-4, Douglas Bates wrote:
> > Would it be reasonable to create a Blosc package or it is best to
> > incorporate it directly into the HDF5 package?  If a separate package is
> > reasonable I could start on it, as I was the one who suggested this in the
> > first place.
> > 
> > On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote:
> >> All these testimonials do make it sound promising. Even three-fold
> >> compression
> >> is a pretty big deal.
> >> 
> >> One disadvantage to compression is that it makes mmap impossible. But,
> >> since
> >> HDF5 supports hyperslabs, that's not as big a deal as it would have been.
> >> 
> >> --Tim
> >> 
> >> On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote:
> >> > I've used Blosc in the past with great success.  Oftentimes it is
> >> 
> >> faster
> >> 
> >> > than the uncompressed version if IO is the bottleneck.  The compression
> >> > ratios are not great but that is really not the point.
> >> > 
> >> > On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote:
> >> > > That looks pretty sweet. It seems to avoid a lot of the pitfalls of
> >> > > naively compressing data files while still getting the benefits. It
> >> 
> >> would
> >> 
> >> > > be great to support that in JLD, maybe even turned on by default.
> >> > > 
> >> > > 
> >> > > On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire <[email protected]
> >> > > 
> >> > > <javascript:>> wrote:
> >> > >> Just to hype blosc a little more, see
> >> > >> 
> >> > >> http://www.blosc.org/blosc-in-depth.html
> >> > >> 
> >> > >> The main feature is that data is chunked so that the compressed
> >> 
> >> chunk
> >> 
> >> > >> size fits into L1 cache, and is then decompressed and used there.
> >>  
> >>  There
> >>  
> >> > >> are a few more buzzwords (multithreading, simd) in the link above.
> >> 
> >> Worth
> >> 
> >> > >> exploring where this might be useful in Julia.
> >> > >> 
> >> > >> Cheers,
> >> > >> 
> >> > >>   Kevin
> >> > >> 
> >> > >> On Tuesday, September 2, 2014, Tim Holy <[email protected]
> >> 
> >> <javascript:>>
> >> 
> >> > >> wrote:
> >> > >>> HDF5/JLD does support compression:
> >> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w
> >> 
> >> > >>> riting-data
> >> > >>> 
> >> > >>> But it's not turned on by default. Matlab uses compression by
> >> 
> >> default,
> >> 
> >> > >>> and
> >> > >>> I've found it's a huge bottleneck in terms of performance
> >> > >>> (
> >> 
> >> http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files
> >> 
> >> > >>> -more-quickly). But perhaps there's a good middle ground. It would
> >> 
> >> take
> >> 
> >> > >>> someone
> >> > >>> doing a little experimentation to see what the compromises are.
> >> > >>> 
> >> > >>> --Tim
> >> > >>> 
> >> > >>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote:
> >> > >>> > Now that the JLD format can handle DataFrame objects I would like
> >> 
> >> to
> >> 
> >> > >>> switch
> >> > >>> 
> >> > >>> > from storing data sets in .RData format to .jld format.  Datasets
> >> > >>> 
> >> > >>> stored in
> >> > >>> 
> >> > >>> > .RData format are compressed after they are written.  The default
> >> > >>> > compression is gzip.  Bzip2 and xz compression are also
> >> 
> >> available.
> >> 
> >> > >>> > The
> >> > >>> > compression can make a substantial difference in the file size
> >> 
> >> because
> >> 
> >> > >>> the
> >> > >>> 
> >> > >>> > data values are often highly repetitive.
> >> > >>> > 
> >> > >>> > JLD is different in scope in that .jld files can be queried using
> >> > >>> 
> >> > >>> external
> >> > >>> 
> >> > >>> > programs like h5ls and the files can have new data added or
> >> 
> >> existing
> >> 
> >> > >>> data
> >> > >>> 
> >> > >>> > edited or removed.  The .RData format is an archival format.
> >>  
> >>  Once the
> >>  
> >> > >>> file
> >> > >>> 
> >> > >>> > is written it cannot be modified in place.
> >> > >>> > 
> >> > >>> > Given these differences I can appreciate that JLD files are not
> >> > >>> 
> >> > >>> compressed.
> >> > >>> 
> >> > >>> >  Nevertheless I think it would be useful to adopt a convention in
> >> 
> >> the
> >> 
> >> > >>> JLD
> >> > >>> 
> >> > >>> > module for accessing data from files with a .jld.xz or .jld.7z
> >> > >>> 
> >> > >>> extension.
> >> > >>> 
> >> > >>> >  It could be as simple as uncompressing the files in a temporary
> >> > >>> 
> >> > >>> directory,
> >> > >>> 
> >> > >>> > reading then removing, or it could be more sophisticated.  I
> >> 
> >> notice
> >> 
> >> > >>> that my
> >> > >>> 
> >> > >>> > versions of libjulia.so on an Ubuntu 64-bit system are linked
> >> 
> >> against
> >> 
> >> > >>> both
> >> > >>> 
> >> > >>> > libz.so and liblzma.so
> >> > >>> > 
> >> > >>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so
> >> > >>> > linux-vdso.so.1 =>  (0x00007fff5214f000)
> >> > >>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> >> 
> >> (0x00007f62932ee000)
> >> 
> >> > >>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
> >> > >>> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
> >> > >>> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> >> 
> >> (0x00007f6292bc6000)
> >> 
> >> > >>> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> >> > >>> > (0x00007f62929a8000)
> >> > >>> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8
> >> > >>> > (0x00007f629278c000)
> >> > >>> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> >> > >>> > (0x00007f6292488000)
> >> > >>> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> >> > >>> 
> >> > >>> (0x00007f6292272000)
> >> > >>> 
> >> > >>> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
> >> > >>> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
> >> > >>> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
> >> > >>> > (0x00007f6291c89000)
> >> > >>> > 
> >> > >>> > 
> >> > >>> > AFAIK the user-level interface to gzip requires the GZip package.
> >> > >>> 
> >> > >>> Unless I
> >> > >>> 
> >> > >>> > have missed something (always a possibility) there is no
> >> 
> >> user-level
> >> 
> >> > >>> > interface to liblzma in Julia.  If the library is going to be
> >> 
> >> linked
> >> 
> >> > >>> > anyway, would it make sense to provide a user-level interface in
> >> > >>> > Julia?

Reply via email to