Just a small idea. I would make makedict also take a type so that 
makedict(1024) also works. Alternatively using typemax(UInt8) would remove 
the dependence on the magic number.

On Saturday, 2 January 2016 03:05:19 UTC+9, Erik Schnetter wrote:
>
> Julia distinguishes between two kinds of strings, ASCIIString and 
> UTF8String, both subtype of AbstractString. Your code seems to assume 
> that a character can handle values between 0 and 255, which is not 
> true in Julia. Also, the way the code decomposes the uncompressed 
> string rather assumes that each element of the uncompressed string is 
> a byte, not a code point, and the resulting strings stored in the 
> dictionary are not necessarily value UTF8. Of course, your algorithm 
> is completely valid, however, it may be better to use the types UInt8 
> and Vector{UInt8} instead of Char and AbstractString. An important 
> respective type is `IOBuffer`, and an important function is 
> `takebuf_array`. 
>
> Smaller points are that Julia prefers the type `Int` over `UInt32` for 
> sizes. I would use `Int` instead of both `UInt32` and `Int64`, without 
> loss of generality. 
>
> This does not explain the large amount of allocations. My assumption 
> -- untested, but corroborated by profiling -- is that these come from 
> the string concatenation operator `*` in line 4 of `lzw_compress`. 
> Instead of repeatedly appending characters to a string, I would use an 
> IOBuffer, where appending is much cheaper. You can then use 
> `takebuf_array` to obtain the buffer's byte array. The dictionary 
> would have type `Dict{Vector{UInt8}, Int}`. 
>
> Here is my respective version: 
>
> function makedict(size) 
>     dict = Dict{Vector{UInt8},Int}() 
>     for i in 0:size-1 
>         dict[UInt8[i]] = i 
>     end 
>     dict 
> end 
>
> function lzw_compress(uncompressed) 
>     dict = makedict(256) 
>     w, widx = IOBuffer(), -1 
>     result = Int[] 
>     for b in IOBuffer(uncompressed1).data 
>         write(w, b) 
>         newidx = get(dict, w.data, -1) 
>         if newidx != -1 
>             widx = newidx 
>         else 
>             push!(result, widx) 
>             dict[takebuf_array(w)] = length(dict) 
>             write(w, b) 
>             widx = dict[w.data] 
>         end 
>     end 
>
>     widx!=-1 && push!(result, widx) 
>     result 
> end 
>
> In C++, Julia's IOBuffer would be a stringstream, in Java, it would be 
> a StringBuffer. 
>
> -erik 
>
>
> On Fri, Jan 1, 2016 at 9:59 AM, Robert Feldt <[email protected] 
> <javascript:>> wrote: 
> > I was planning to add a Julia entry for LZW compression at Rosetta Code, 
> see 
> > http://rosettacode.org/wiki/LZW_compression , and did a first Julia 
> > implementation below. It allocates quite some mem though so ideas for 
> > speeding it up and/or clarifying the code before I submit it would be 
> > appreciated. Thanks! 
> > 
> > julia> function makedict(size) 
> >         dict = Dict{AbstractString,UInt32}() 
> >         [dict[string(Char(i))] = i for i in 0:(size-1)] 
> >         dict 
> >        end 
> > makedict (generic function with 1 method) 
> > 
> > julia> function lzw_compress(uncompressed) 
> >         dict, w, result = makedict(256), "", Int64[] 
> >         for i in 1:length(uncompressed) 
> >         wc = w * uncompressed[i:i] 
> >         if haskey(dict, wc) 
> >         w = wc 
> >         else 
> >         push!(result, dict[w]) 
> >         dict[wc] = length(dict) 
> >         w = uncompressed[i:i] 
> >         end 
> >         end 
> > 
> >         isempty(w) || push!(result, dict[w]) 
> >         result 
> >        end 
> > lzw_compress (generic function with 1 method) 
> > 
> > julia> t = "TOBEORNOTTOBEORTOBEORNOT" 
> > "TOBEORNOTTOBEORTOBEORNOT" 
> > 
> > julia> compressed = lzw_compress(t) 
> > 16-element Array{Int64,1}: 
> >   84 
> >   79 
> >   66 
> >   69 
> >   79 
> >   82 
> >   78 
> >   79 
> >   84 
> >  256 
> >  258 
> >  260 
> >  265 
> >  259 
> >  261 
> >  263 
> > 
> > julia> tl = t ^ 10000; 
> > 
> > julia> @time lzw_compress(tl); 
> >   0.101714 seconds (971.11 k allocations: 54.181 MB, 28.92% gc time) 
>
>
>
> -- 
> Erik Schnetter <[email protected] <javascript:>> 
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>

Reply via email to