Now that the JLD format can handle DataFrame objects I would like to switch 
from storing data sets in .RData format to .jld format.  Datasets stored in 
.RData format are compressed after they are written.  The default 
compression is gzip.  Bzip2 and xz compression are also available.  The 
compression can make a substantial difference in the file size because the 
data values are often highly repetitive.

JLD is different in scope in that .jld files can be queried using external 
programs like h5ls and the files can have new data added or existing data 
edited or removed.  The .RData format is an archival format.  Once the file 
is written it cannot be modified in place.

Given these differences I can appreciate that JLD files are not compressed. 
 Nevertheless I think it would be useful to adopt a convention in the JLD 
module for accessing data from files with a .jld.xz or .jld.7z extension. 
 It could be as simple as uncompressing the files in a temporary directory, 
reading then removing, or it could be more sophisticated.  I notice that my 
versions of libjulia.so on an Ubuntu 64-bit system are linked against both 
libz.so and liblzma.so

$ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so 
linux-vdso.so.1 =>  (0x00007fff5214f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
(0x00007f62929a8000)
libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 
(0x00007f629278c000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
(0x00007f6292488000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6292272000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
/lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f6291c89000)


AFAIK the user-level interface to gzip requires the GZip package.  Unless I 
have missed something (always a possibility) there is no user-level 
interface to liblzma in Julia.  If the library is going to be linked 
anyway, would it make sense to provide a user-level interface in Julia? 

Reply via email to