HDF5/JLD does support compression: https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-writing-data
But it's not turned on by default. Matlab uses compression by default, and I've found it's a huge bottleneck in terms of performance (http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files-more-quickly). But perhaps there's a good middle ground. It would take someone doing a little experimentation to see what the compromises are. --Tim On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: > Now that the JLD format can handle DataFrame objects I would like to switch > from storing data sets in .RData format to .jld format. Datasets stored in > .RData format are compressed after they are written. The default > compression is gzip. Bzip2 and xz compression are also available. The > compression can make a substantial difference in the file size because the > data values are often highly repetitive. > > JLD is different in scope in that .jld files can be queried using external > programs like h5ls and the files can have new data added or existing data > edited or removed. The .RData format is an archival format. Once the file > is written it cannot be modified in place. > > Given these differences I can appreciate that JLD files are not compressed. > Nevertheless I think it would be useful to adopt a convention in the JLD > module for accessing data from files with a .jld.xz or .jld.7z extension. > It could be as simple as uncompressing the files in a temporary directory, > reading then removing, or it could be more sophisticated. I notice that my > versions of libjulia.so on an Ubuntu 64-bit system are linked against both > libz.so and liblzma.so > > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so > linux-vdso.so.1 => (0x00007fff5214f000) > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000) > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000) > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000) > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000) > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > (0x00007f62929a8000) > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 > (0x00007f629278c000) > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 > (0x00007f6292488000) > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6292272000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000) > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000) > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f6291c89000) > > > AFAIK the user-level interface to gzip requires the GZip package. Unless I > have missed something (always a possibility) there is no user-level > interface to liblzma in Julia. If the library is going to be linked > anyway, would it make sense to provide a user-level interface in Julia?