Just for the record - multiple tables and unrolling in Julia now beats C 
(very slightly).

Tim's @nexprs macro generally helps with the unrolling (although I seem to 
have hit a bug misunderstanding in one particular case, so am having to 
copy + paste in one place).

Thanks,
Andrew

On Thursday, 10 April 2014 19:52:03 UTC-3, andrew cooke wrote:
>
>
> huh.  i had forgotten about this.
>
> i'll try four tables.  it shouldn't be that hard to add (although there's 
> going to be extra book-keeping - it's not an obvious gain to me).
>
> cheers,
> andrew
>
> On Thursday, 10 April 2014 19:08:21 UTC-3, Chris Foster wrote:
>>
>> On Fri, Apr 11, 2014 at 6:44 AM, Laszlo Hars <[email protected]> wrote: 
>> > note that the running time does not change with a partial loop unroll, 
>> like 
>> > this: 
>> > ~~~ 
>> > function signed_loop{D<:Unsigned, A<:Unsigned}(::Type{D}, r::A, data, 
>> > table::Vector{A}) 
>> >     local j = 0 
>> >      for i = 1 : div(length(data),20) 
>> >         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))] 
>> [...] 
>> >         r = (r >>> 8) $ table[1 + (data[j+=1]$convert(D,r))] 
>> >     end 
>> >     return r 
>> > end 
>> > ~~~ 
>>
>> In that case, it's probably because zlib is processing the bytes four 
>> at a time, using four different CRC tables.  This is quite distinct 
>> from the loop unrolling, and can have a larger effect because it 
>> removes some of the data dependency between iterations.  It looks 
>> something like this (very untested!  I didn't have time to figure out 
>> how to make the four different CRC tables.) 
>>
>> data4 = reinterpret(Uint32, data)  # note, need special cases for 
>> trailing bytes 
>> for i = 1:div(length(data4)) 
>>     word::Uint32 = data4[i] 
>>     r = r $ word 
>>     r = table3[1 + (r & 0xff)] $ table2[1 + ((r >> 8) $ 0xff)] $ 
>> table1[1 + ((r >> 16) $ 0xff)] $ table0[1 + (r >> 24)] 
>> end 
>>
>

Reply via email to