If people want to try Blosc please see this issue for how to build it on Julia 0.3.0 (at least on my Mac OS X 10.9):
https://github.com/jakebolewski/Blosc.jl/issues/1 but then one can compare Zlib and Blosc compressors: using Zlib zliblength(str) = length(Zlib.compress(str,9,false,true)) using Blosc lz4length(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:lz4)) lz4hclength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:lz4hc)) bzliblength(s) = length(Blosc.compress(convert(Vector{Uint8}, s), clevel=9, cname=:zlib)) function report(name, func, input) tic() len = func(input) t = toq() @printf("%s, time = %.3e seconds, compression ratio = %.3f\n", name, t, length(input)/len) end for exponent in 1:7 n = 10^exponent input = Uint8[1:n]; strinput = string(input); println("\nInput of length 10^$exponent") report("zlib ", (input) -> zliblength(input), input) report("zlib in blosc", (input) -> lz4hclength(input), input) report("lz4hc ", (input) -> bzliblength(input), input) report("lz4 ", (input) -> lz4length(input), input) end which gives output: Input of length 10^1 zlib , time = 4.789e-02 seconds, compression ratio = 0.833 zlib in blosc, time = 3.256e-02 seconds, compression ratio = 0.385 lz4hc , time = 3.939e-03 seconds, compression ratio = 0.385 lz4 , time = 3.482e-03 seconds, compression ratio = 0.385 Input of length 10^2 zlib , time = 1.211e-04 seconds, compression ratio = 0.980 zlib in blosc, time = 1.448e-05 seconds, compression ratio = 0.862 lz4hc , time = 3.801e-06 seconds, compression ratio = 0.862 lz4 , time = 3.403e-06 seconds, compression ratio = 0.862 Input of length 10^3 zlib , time = 8.187e-05 seconds, compression ratio = 3.571 zlib in blosc, time = 1.400e-04 seconds, compression ratio = 3.413 lz4hc , time = 5.589e-05 seconds, compression ratio = 3.226 lz4 , time = 1.119e-05 seconds, compression ratio = 3.413 Input of length 10^4 zlib , time = 1.158e-04 seconds, compression ratio = 27.473 zlib in blosc, time = 4.732e-05 seconds, compression ratio = 30.395 lz4hc , time = 1.107e-04 seconds, compression ratio = 25.381 lz4 , time = 6.572e-06 seconds, compression ratio = 30.395 Input of length 10^5 zlib , time = 7.319e-04 seconds, compression ratio = 140.252 zlib in blosc, time = 2.058e-04 seconds, compression ratio = 146.628 lz4hc , time = 6.519e-04 seconds, compression ratio = 134.590 lz4 , time = 2.368e-05 seconds, compression ratio = 146.628 Input of length 10^6 zlib , time = 4.517e-03 seconds, compression ratio = 238.095 zlib in blosc, time = 2.291e-04 seconds, compression ratio = 237.473 lz4hc , time = 4.493e-03 seconds, compression ratio = 236.407 lz4 , time = 6.989e-04 seconds, compression ratio = 198.807 Input of length 10^7 zlib , time = 4.499e-02 seconds, compression ratio = 255.669 zlib in blosc, time = 3.146e-02 seconds, compression ratio = 246.299 lz4hc , time = 1.749e-02 seconds, compression ratio = 247.078 lz4 , time = 5.670e-03 seconds, compression ratio = 200.489 It seems that LZ4Hc compression in Blosc is sometimes quite some bit faster, but not always. Compression ratio is good. LZ4 is always faster than the others but sometimes compresses a bit less. For strings shorter than ~350 characters there is not always any compression of the input. Note that the string being compressed here is very regular though so this eval is not very good and might be misleading of compression levels to expect. This is just a very rough indication. Cheers, Robert Den måndagen den 10:e november 2014 kl. 09:49:54 UTC+1 skrev Robert Feldt: > > For a project I need fast string compression accessible from Julia. I have > found: > > * Gzip.jl, file-based access to gzip compression > https://github.com/JuliaLang/GZip.jl > > * Zlib.jl, in-memory access to gzip compression > https://github.com/dcjones/Zlib.jl > > * There has been talks about doing a Julia package for Blosc (blosc.org) > and I found this but not sure it's working: > https://github.com/jakebolewski/Blosc.jl > https://groups.google.com/forum/#!topic/julia-users/eT5_h9zfT5k > > If anyone knows of more/other compression packages useable from Julia, > please share in this thread. This way people can get a more up-to-date > view. > Compression is a basic building block for a lot of different things so > good if we have many options in Julia. Would be very nice to have access to > liblzma, xz, paq etc, long-term. > > If one just needs to estimate the LZ76 complexity there is a pure Julia > implementation here: > > https://github.com/robertfeldt/InfoTheory.jl/blob/master/spikes/lempel_ziv_76_complexity.jl > but it has bad performance for long strings compare to Zlib so probably > not very useful. > > Thanks, > > Robert Feldt >
