On Fri, Apr 11, 2014 at 6:44 AM, Laszlo Hars <[email protected]> wrote:
> note that the running time does not change with a partial loop unroll, like
> this:
> ~~~
> function signed_loop{D<:Unsigned, A<:Unsigned}(::Type{D}, r::A, data,
> table::Vector{A})
>     local j = 0
>      for i = 1 : div(length(data),20)
>         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))]
[...]
>         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))]
>     end
>     return r
> end
> ~~~

In that case, it's probably because zlib is processing the bytes four
at a time, using four different CRC tables.  This is quite distinct
from the loop unrolling, and can have a larger effect because it
removes some of the data dependency between iterations.  It looks
something like this (very untested!  I didn't have time to figure out
how to make the four different CRC tables.)

data4 = reinterpret(Uint32, data)  # note, need special cases for trailing bytes
for i = 1:div(length(data4))
    word::Uint32 = data4[i]
    r = r $ word
    r = table3[1 + (r & 0xff)] $ table2[1 + ((r >> 8) $ 0xff)] $
table1[1 + ((r >> 16) $ 0xff)] $ table0[1 + (r >> 24)]
end

Reply via email to