Linus Torvalds <torva...@linux-foundation.org> writes: > I pretty much can guarantee you that it improves things only because it > makes gcc generate crap code, which then hides some of the P4 issues. > > I'd also suggest you try gcc-4.4, since that apparently fixes some of the > oddest spill issues.
Thanks for the hint. I tried gcc-4.4 and it produces slower code than 4.3 on the gnulib SHA1 implementation and my patch makes it even more! I noticed that on my machine your implementation is ~30-40% faster using SHA_ROT for rol/ror instructions than inline assembly, at least with the test-case Pádraig wrote. Am I the only one reporting it? Cheers, Giuseppe