Re: [julia-users] Re: bit-twiddling micro benchmark

Stefan Karpinski Sun, 20 Apr 2014 13:14:51 -0700

That's really impressive that you managed to get it that fast.


On Sun, Apr 20, 2014 at 1:35 PM, andrew cooke <[email protected]> wrote:

>
> Just for the record - multiple tables and unrolling in Julia now beats C
> (very slightly).
>
> Tim's @nexprs macro generally helps with the unrolling (although I seem to
> have hit a bug misunderstanding in one particular case, so am having to
> copy + paste in one place).
>
> Thanks,
> Andrew
>
>
> On Thursday, 10 April 2014 19:52:03 UTC-3, andrew cooke wrote:
>>
>>
>> huh.  i had forgotten about this.
>>
>> i'll try four tables.  it shouldn't be that hard to add (although there's
>> going to be extra book-keeping - it's not an obvious gain to me).
>>
>> cheers,
>> andrew
>>
>> On Thursday, 10 April 2014 19:08:21 UTC-3, Chris Foster wrote:
>>>
>>> On Fri, Apr 11, 2014 at 6:44 AM, Laszlo Hars <[email protected]>
>>> wrote:
>>> > note that the running time does not change with a partial loop unroll,
>>> like
>>> > this:
>>> > ~~~
>>> > function signed_loop{D<:Unsigned, A<:Unsigned}(::Type{D}, r::A, data,
>>> > table::Vector{A})
>>> >     local j = 0
>>> >      for i = 1 : div(length(data),20)
>>> >         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))]
>>> [...]
>>> >         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))]
>>> >     end
>>> >     return r
>>> > end
>>> > ~~~
>>>
>>> In that case, it's probably because zlib is processing the bytes four
>>> at a time, using four different CRC tables.  This is quite distinct
>>> from the loop unrolling, and can have a larger effect because it
>>> removes some of the data dependency between iterations.  It looks
>>> something like this (very untested!  I didn't have time to figure out
>>> how to make the four different CRC tables.)
>>>
>>> data4 = reinterpret(Uint32, data)  # note, need special cases for
>>> trailing bytes
>>> for i = 1:div(length(data4))
>>>     word::Uint32 = data4[i]
>>>     r = r $ word
>>>     r = table3[1 + (r & 0xff)] $ table2[1 + ((r >> 8) $ 0xff)] $
>>> table1[1 + ((r >> 16) $ 0xff)] $ table0[1 + (r >> 24)]
>>> end
>>>
>>

Re: [julia-users] Re: bit-twiddling micro benchmark

Reply via email to