On Thu, Jun 20, 2013 at 10:03 PM, Mattias Engdegård
<matti...@bredband.net>wrote:

> 20 jun 2013 kl. 21.44 skrev Stefan Fuhrmann:
>
>
>  A capable compiler should unroll the inner loop
>> such that we end up with ~10 cycles / 4 bytes.
>> That would be slightly faster than the "* 33" loop.
>>
>
> That depends on a lot of things (such as the latency/throughput of the
> multiplier).
>

Of curse. I checked the specs and SPARC turns out to be
2-issue, with T4 being OOO. And I simply assume that it
can handle one multiplication every 10 cycles ;) I'm more
worried about the compiler not being aggressive enough.

By the way, the new inner loop suffers from signed overflow (undefined
> behaviour), and also sign extension when char is signed (which it is on
> SPARC). Both need to be fixed.


Good catch. There is a similar issue with the "*33" loop,
although in practice both should simply produce worse
hash distributions than necessary. Fixed in r1495204.

 I had preferred the other patch for its simplicity.
>> However, I'm fine with the current one and voted
>> for its backport to 1.8.x. It gives us target-independent
>> cache behavior - which is a good thing.
>>
>
> No it doesn't. The code already produced different hashes on x86 and ppc
> because of differences in byte order.
>

You are right. I was not precise here: I meant SPARC uses
the same hash x86 now.

After some IRC discussion, we added r1495209 which
provides actual platform-independence.

-- Stefan^2.

Reply via email to