If people want to try Blosc please see this issue for how to build it on 
Julia 0.3.0 (at least on my Mac OS X 10.9):

https://github.com/jakebolewski/Blosc.jl/issues/1

but then one can compare Zlib and Blosc compressors:

using Zlib
zliblength(str) = length(Zlib.compress(str,9,false,true))
using Blosc
lz4length(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4))
lz4hclength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:lz4hc))
bzliblength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, 
cname=:zlib))

function report(name, func, input)
  tic()
  len = func(input)
  t = toq()
  @printf("%s, time = %.3e seconds, compression ratio = %.3f\n", name, t, 
length(input)/len)
end

for exponent in 1:7
  n = 10^exponent
  input = Uint8[1:n];
  strinput = string(input);
  println("\nInput of length 10^$exponent")
  report("zlib         ", (input) -> zliblength(input), input)
  report("zlib in blosc", (input) -> lz4hclength(input), input)
  report("lz4hc        ", (input) -> bzliblength(input), input)
  report("lz4          ", (input) -> lz4length(input), input)
end

which gives output:

Input of length 10^1
zlib         , time = 4.789e-02 seconds, compression ratio = 0.833
zlib in blosc, time = 3.256e-02 seconds, compression ratio = 0.385
lz4hc        , time = 3.939e-03 seconds, compression ratio = 0.385
lz4          , time = 3.482e-03 seconds, compression ratio = 0.385

Input of length 10^2
zlib         , time = 1.211e-04 seconds, compression ratio = 0.980
zlib in blosc, time = 1.448e-05 seconds, compression ratio = 0.862
lz4hc        , time = 3.801e-06 seconds, compression ratio = 0.862
lz4          , time = 3.403e-06 seconds, compression ratio = 0.862

Input of length 10^3
zlib         , time = 8.187e-05 seconds, compression ratio = 3.571
zlib in blosc, time = 1.400e-04 seconds, compression ratio = 3.413
lz4hc        , time = 5.589e-05 seconds, compression ratio = 3.226
lz4          , time = 1.119e-05 seconds, compression ratio = 3.413

Input of length 10^4
zlib         , time = 1.158e-04 seconds, compression ratio = 27.473
zlib in blosc, time = 4.732e-05 seconds, compression ratio = 30.395
lz4hc        , time = 1.107e-04 seconds, compression ratio = 25.381
lz4          , time = 6.572e-06 seconds, compression ratio = 30.395

Input of length 10^5
zlib         , time = 7.319e-04 seconds, compression ratio = 140.252
zlib in blosc, time = 2.058e-04 seconds, compression ratio = 146.628
lz4hc        , time = 6.519e-04 seconds, compression ratio = 134.590
lz4          , time = 2.368e-05 seconds, compression ratio = 146.628

Input of length 10^6
zlib         , time = 4.517e-03 seconds, compression ratio = 238.095
zlib in blosc, time = 2.291e-04 seconds, compression ratio = 237.473
lz4hc        , time = 4.493e-03 seconds, compression ratio = 236.407
lz4          , time = 6.989e-04 seconds, compression ratio = 198.807

Input of length 10^7
zlib         , time = 4.499e-02 seconds, compression ratio = 255.669
zlib in blosc, time = 3.146e-02 seconds, compression ratio = 246.299
lz4hc        , time = 1.749e-02 seconds, compression ratio = 247.078
lz4          , time = 5.670e-03 seconds, compression ratio = 200.489

It seems that LZ4Hc compression in Blosc is sometimes quite some bit 
faster, but not always. Compression ratio is good. 
LZ4 is always faster than the others but sometimes compresses a bit less.
For strings shorter than ~350 characters there is not always any 
compression of the input.
Note that the string being compressed here is very regular though so this 
eval is not very good and might be misleading of compression levels to 
expect. This is just a very rough indication.

Cheers,

Robert



Den måndagen den 10:e november 2014 kl. 09:49:54 UTC+1 skrev Robert Feldt:
>
> For a project I need fast string compression accessible from Julia. I have 
> found:
>
> * Gzip.jl, file-based access to gzip compression
>   https://github.com/JuliaLang/GZip.jl
>
> * Zlib.jl, in-memory access to gzip compression
>   https://github.com/dcjones/Zlib.jl
>
> * There has been talks about doing a Julia package for Blosc (blosc.org) 
> and I found this but not sure it's working:
>   https://github.com/jakebolewski/Blosc.jl
>   https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k
>
> If anyone knows of more/other compression packages useable from Julia, 
> please share in this thread. This way people can get a more up-to-date 
> view. 
> Compression is a basic building block for a lot of different things so 
> good if we have many options in Julia. Would be very nice to have access to 
> liblzma, xz, paq etc, long-term.
>
> If one just needs to estimate the LZ76 complexity there is a pure Julia 
> implementation here:
>
> https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl
> but it has bad performance for long strings compare to Zlib so probably 
> not very useful.
>
> Thanks,
>
> Robert Feldt
>

Reply via email to