Would it be reasonable to create a Blosc package or it is best to incorporate it directly into the HDF5 package? If a separate package is reasonable I could start on it, as I was the one who suggested this in the first place.
On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote: > > All these testimonials do make it sound promising. Even three-fold > compression > is a pretty big deal. > > One disadvantage to compression is that it makes mmap impossible. But, > since > HDF5 supports hyperslabs, that's not as big a deal as it would have been. > > --Tim > > On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote: > > I've used Blosc in the past with great success. Oftentimes it is faster > > than the uncompressed version if IO is the bottleneck. The compression > > ratios are not great but that is really not the point. > > > > On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote: > > > That looks pretty sweet. It seems to avoid a lot of the pitfalls of > > > naively compressing data files while still getting the benefits. It > would > > > be great to support that in JLD, maybe even turned on by default. > > > > > > > > > On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire <[email protected] > > > > > > <javascript:>> wrote: > > >> Just to hype blosc a little more, see > > >> > > >> http://www.blosc.org/blosc-in-depth.html > > >> > > >> The main feature is that data is chunked so that the compressed chunk > > >> size fits into L1 cache, and is then decompressed and used there. > There > > >> are a few more buzzwords (multithreading, simd) in the link above. > Worth > > >> exploring where this might be useful in Julia. > > >> > > >> Cheers, > > >> > > >> Kevin > > >> > > >> On Tuesday, September 2, 2014, Tim Holy <[email protected] > <javascript:>> > > >> > > >> wrote: > > >>> HDF5/JLD does support compression: > > >>> > > >>> > https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w > > >>> riting-data > > >>> > > >>> But it's not turned on by default. Matlab uses compression by > default, > > >>> and > > >>> I've found it's a huge bottleneck in terms of performance > > >>> ( > > >>> > http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files > > >>> -more-quickly). But perhaps there's a good middle ground. It would > take > > >>> someone > > >>> doing a little experimentation to see what the compromises are. > > >>> > > >>> --Tim > > >>> > > >>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: > > >>> > Now that the JLD format can handle DataFrame objects I would like > to > > >>> > > >>> switch > > >>> > > >>> > from storing data sets in .RData format to .jld format. Datasets > > >>> > > >>> stored in > > >>> > > >>> > .RData format are compressed after they are written. The default > > >>> > compression is gzip. Bzip2 and xz compression are also available. > > >>> > The > > >>> > compression can make a substantial difference in the file size > because > > >>> > > >>> the > > >>> > > >>> > data values are often highly repetitive. > > >>> > > > >>> > JLD is different in scope in that .jld files can be queried using > > >>> > > >>> external > > >>> > > >>> > programs like h5ls and the files can have new data added or > existing > > >>> > > >>> data > > >>> > > >>> > edited or removed. The .RData format is an archival format. Once > the > > >>> > > >>> file > > >>> > > >>> > is written it cannot be modified in place. > > >>> > > > >>> > Given these differences I can appreciate that JLD files are not > > >>> > > >>> compressed. > > >>> > > >>> > Nevertheless I think it would be useful to adopt a convention in > the > > >>> > > >>> JLD > > >>> > > >>> > module for accessing data from files with a .jld.xz or .jld.7z > > >>> > > >>> extension. > > >>> > > >>> > It could be as simple as uncompressing the files in a temporary > > >>> > > >>> directory, > > >>> > > >>> > reading then removing, or it could be more sophisticated. I > notice > > >>> > > >>> that my > > >>> > > >>> > versions of libjulia.so on an Ubuntu 64-bit system are linked > against > > >>> > > >>> both > > >>> > > >>> > libz.so and liblzma.so > > >>> > > > >>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so > > >>> > linux-vdso.so.1 => (0x00007fff5214f000) > > >>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 > (0x00007f62932ee000) > > >>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000) > > >>> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000) > > >>> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 > (0x00007f6292bc6000) > > >>> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > > >>> > (0x00007f62929a8000) > > >>> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 > > >>> > (0x00007f629278c000) > > >>> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > > >>> > (0x00007f6292488000) > > >>> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 > > >>> > > >>> (0x00007f6292272000) > > >>> > > >>> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000) > > >>> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000) > > >>> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 > > >>> > (0x00007f6291c89000) > > >>> > > > >>> > > > >>> > AFAIK the user-level interface to gzip requires the GZip package. > > >>> > > >>> Unless I > > >>> > > >>> > have missed something (always a possibility) there is no > user-level > > >>> > interface to liblzma in Julia. If the library is going to be > linked > > >>> > anyway, would it make sense to provide a user-level interface in > > >>> > Julia? > >
