Nikos Mavrogiannopoulos <[email protected]> writes:

> A quick and dirty patch to enable SSE2 instructions for memxor() on
> Intel CPUs is attached.
> I tried to follow the logic in the fat.c file, but I may have missed
> something. I've not added memxor3() because it is actually slower with
> SSE2.

Cool!

> SSE2:
>             memxor     aligned 26081.83
>             memxor   unaligned 25893.69
>
> No-SSE2:
>             memxor     aligned 17806.94
>             memxor   unaligned 16581.48

How confident are you that the intel vs amd check is the right way
to enable sse2? I guess we could add check on the particular cpu model
later, if needed. Which model(s) did you benchmark on?

It would be nice in a way if we could share code with x86_64/memxor.asm.
E.g., by defining x86_64/fat/memxor-1.asm and x86_64/fat/memxor-2.asm
which each include the same file with a different setting of USE_SSE2.

But I haven't looked at that carefully, it might be better to have a
unified x86_64/fat/memxor.asm with two entry points, like you do.

I've also been considering m4 hacks to let a single fat .asm file
include multiple other .asm files, or including the same file twice,
without labels or m4 definitions colliding, but I'm not sure that's
worth the effort. The foo-1.asm, foo-2.asm, ... scheme is a bit
inelegant, but it is easy to understand.

> +  _nettle_cpuid (0, cpuid_data);
> +  if (memcmp(&cpuid_data[1], "Genu", 4) == 0 &&
> +      memcmp(&cpuid_data[3], "ineI", 4) == 0 &&
> +      memcmp(&cpuid_data[2], "ntel", 4) == 0) {

This could also be written as a single memcmp call, or 3 comparisons of
integers.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
nettle-bugs mailing list
[email protected]
http://lists.lysator.liu.se/mailman/listinfo/nettle-bugs

Reply via email to