On Thu, Jun 20, 2013 at 10:03 PM, Mattias Engdegård <matti...@bredband.net>wrote:
> 20 jun 2013 kl. 21.44 skrev Stefan Fuhrmann: > > > A capable compiler should unroll the inner loop >> such that we end up with ~10 cycles / 4 bytes. >> That would be slightly faster than the "* 33" loop. >> > > That depends on a lot of things (such as the latency/throughput of the > multiplier). > Of curse. I checked the specs and SPARC turns out to be 2-issue, with T4 being OOO. And I simply assume that it can handle one multiplication every 10 cycles ;) I'm more worried about the compiler not being aggressive enough. By the way, the new inner loop suffers from signed overflow (undefined > behaviour), and also sign extension when char is signed (which it is on > SPARC). Both need to be fixed. Good catch. There is a similar issue with the "*33" loop, although in practice both should simply produce worse hash distributions than necessary. Fixed in r1495204. I had preferred the other patch for its simplicity. >> However, I'm fine with the current one and voted >> for its backport to 1.8.x. It gives us target-independent >> cache behavior - which is a good thing. >> > > No it doesn't. The code already produced different hashes on x86 and ppc > because of differences in byte order. > You are right. I was not precise here: I meant SPARC uses the same hash x86 now. After some IRC discussion, we added r1495209 which provides actual platform-independence. -- Stefan^2.