On Tue, Dec 24, 2024 at 11:52:38PM +0000, Pádraig Brady wrote:
However this is a regression on i7-5600U at least:

I'm seeing the same on older consumer hardware, even after the latest patch (i3-6100):

$ time ./cksum --debug /tmp/testfil
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m27.717s
user    0m4.519s
sys     0m23.173s

$ time ./cksum_chorba --debug /tmp/testfil
cksum_chorba: avx512 support not detected
cksum_chorba: avx2 support not detected
cksum_chorba: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m31.288s
user    0m6.863s
sys     0m24.404s


on older server hardware (E3-1240 v5) I do see a slight improvement in user time, but the system time increases and *not once* did I see the overall runtime decrease (I did run them in the opposite order as well). Maybe this indicates that the change trashes the cpu cache or somesuch?

$ time ./cksum --debug /tmp/testfil
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m22.923s
user    0m3.956s
sys     0m18.867s

$ time ./cksum_chorba --debug /tmp/testfil
cksum_chorba: avx512 support not detected
cksum_chorba: avx2 support not detected
cksum_chorba: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m23.962s
user    0m3.768s
sys     0m20.165s

$ time ./cksum_chorba --debug /tmp/testfil
cksum_chorba: avx512 support not detected
cksum_chorba: avx2 support not detected
cksum_chorba: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m25.021s
user    0m3.776s
sys     0m21.235s

$ time ./cksum --debug /tmp/testfil
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
3018728591 68719476736 /tmp/testfil

real    0m23.961s
user    0m4.160s
sys     0m19.798s


on older AMD server hardware it's closer; just as on the intel hardware there's a decrease in user time and an increase in system time, but the results are close enough that it's a wash with sometimes one being faster and somtimes the other:

$ time ./cksum_chorba --debug /tmp/testfil
cksum_chorba: avx512 support not detected
cksum_chorba: avx2 support not detected
cksum_chorba: using pclmul hardware support
3018728591 68719476736 /tmp/testfil
real 0m14.509s user 0m5.083s sys 0m9.410s
$ time ./cksum --debug /tmp/testfil
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
3018728591 68719476736 /tmp/testfil
real 0m14.220s
user    0m5.626s
sys     0m8.578s



cf same binaries on zen4 (EPYC 9354P) where the new code is a clear overall improvement:

$ time ./cksum --debug /tmp/testfil
cksum: using avx512 hardware support
3018728591 68719476736 /tmp/testfil

real    0m9.396s
user    0m1.720s
sys     0m7.676s

$ time ./cksum_chorba --debug /tmp/testfil
cksum_chorba: using avx512 hardware support
3018728591 68719476736 /tmp/testfil

real    0m8.769s
user    0m1.284s
sys     0m7.485s


Reply via email to