I think it would be very much in line with our general ethos of the default
thing we do is the fastest possible thing – and it seems like blosc is that.


On Tue, Sep 2, 2014 at 3:11 PM, Jake Bolewski <[email protected]>
wrote:

> I've used Blosc in the past with great success.  Oftentimes it is faster
> than the uncompressed version if IO is the bottleneck.  The compression
> ratios are not great but that is really not the point.
>
>
> On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote:
>
>> That looks pretty sweet. It seems to avoid a lot of the pitfalls of
>> naively compressing data files while still getting the benefits. It would
>> be great to support that in JLD, maybe even turned on by default.
>>
>>
>> On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire <[email protected]> wrote:
>>
>>> Just to hype blosc a little more, see
>>>
>>> http://www.blosc.org/blosc-in-depth.html
>>>
>>> The main feature is that data is chunked so that the compressed chunk
>>> size fits into L1 cache, and is then decompressed and used there.  There
>>> are a few more buzzwords (multithreading, simd) in the link above. Worth
>>> exploring where this might be useful in Julia.
>>>
>>> Cheers,
>>>   Kevin
>>>
>>>
>>> On Tuesday, September 2, 2014, Tim Holy <[email protected]> wrote:
>>>
>>>> HDF5/JLD does support compression:
>>>> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.
>>>> md#reading-and-writing-data
>>>>
>>>> But it's not turned on by default. Matlab uses compression by default,
>>>> and
>>>> I've found it's a huge bottleneck in terms of performance
>>>> (http://www.mathworks.com/matlabcentral/fileexchange/
>>>> 39721-save-mat-files-more-quickly). But perhaps there's a good middle
>>>> ground. It would take someone
>>>> doing a little experimentation to see what the compromises are.
>>>>
>>>> --Tim
>>>>
>>>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote:
>>>> > Now that the JLD format can handle DataFrame objects I would like to
>>>> switch
>>>> > from storing data sets in .RData format to .jld format.  Datasets
>>>> stored in
>>>> > .RData format are compressed after they are written.  The default
>>>> > compression is gzip.  Bzip2 and xz compression are also available.
>>>> The
>>>> > compression can make a substantial difference in the file size
>>>> because the
>>>> > data values are often highly repetitive.
>>>> >
>>>> > JLD is different in scope in that .jld files can be queried using
>>>> external
>>>> > programs like h5ls and the files can have new data added or existing
>>>> data
>>>> > edited or removed.  The .RData format is an archival format.  Once
>>>> the file
>>>> > is written it cannot be modified in place.
>>>> >
>>>> > Given these differences I can appreciate that JLD files are not
>>>> compressed.
>>>> >  Nevertheless I think it would be useful to adopt a convention in the
>>>> JLD
>>>> > module for accessing data from files with a .jld.xz or .jld.7z
>>>> extension.
>>>> >  It could be as simple as uncompressing the files in a temporary
>>>> directory,
>>>> > reading then removing, or it could be more sophisticated.  I notice
>>>> that my
>>>> > versions of libjulia.so on an Ubuntu 64-bit system are linked against
>>>> both
>>>> > libz.so and liblzma.so
>>>> >
>>>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so
>>>> > linux-vdso.so.1 =>  (0x00007fff5214f000)
>>>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f62932ee000)
>>>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000)
>>>> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000)
>>>> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6292bc6000)
>>>> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
>>>> > (0x00007f62929a8000)
>>>> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8
>>>> > (0x00007f629278c000)
>>>> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
>>>> > (0x00007f6292488000)
>>>> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
>>>> (0x00007f6292272000)
>>>> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000)
>>>> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000)
>>>> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5
>>>> (0x00007f6291c89000)
>>>> >
>>>> >
>>>> > AFAIK the user-level interface to gzip requires the GZip package.
>>>> Unless I
>>>> > have missed something (always a possibility) there is no user-level
>>>> > interface to liblzma in Julia.  If the library is going to be linked
>>>> > anyway, would it make sense to provide a user-level interface in
>>>> Julia?
>>>>
>>>>
>>

Reply via email to