That's interesting. I'm having issues across cfarm as they often don't have
the coreutils dependencies and won't work with the version of clib I'm
building against.

Are you comparing the user times or the real times? IMO the user time is
the important part as the sys part of the timing just depends on disk I/O.
The high I/O (and the fact that we're only reading in 64KB chunks) means
that there's going to be large variance, but I'm still seeing a consistent
improvement over 5-10 runs.

On amazon EC2 t3 (Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz)

ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
cksum_pclmul: using pclmul hardware support
4215202376 4294967296 file

real    0m3.129s
user    0m0.422s
sys     0m2.705s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
cksum_pclmul: using pclmul hardware support
4215202376 4294967296 file

real    0m3.025s
user    0m0.394s
sys     0m2.630s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
cksum_pclmul: using pclmul hardware support
4215202376 4294967296 file

real    0m3.705s
user    0m0.517s
sys     0m3.187s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
cksum_pclmul: using pclmul hardware support
4215202376 4294967296 file

real    0m3.334s
user    0m0.431s
sys     0m2.903s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul --debug file
cksum_pclmul: using pclmul hardware support
4215202376 4294967296 file

real    0m3.250s
user    0m0.420s
sys     0m2.829s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
cksum_pclmul_chorba: avx512 support not detected
cksum_pclmul_chorba: using pclmul hardware support
4215202376 4294967296 file

real    0m2.888s
user    0m0.368s
sys     0m2.518s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
cksum_pclmul_chorba: avx512 support not detected
cksum_pclmul_chorba: using pclmul hardware support
4215202376 4294967296 file

real    0m3.032s
user    0m0.366s
sys     0m2.665s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
cksum_pclmul_chorba: avx512 support not detected
cksum_pclmul_chorba: using pclmul hardware support
4215202376 4294967296 file

real    0m2.938s
user    0m0.347s
sys     0m2.583s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
cksum_pclmul_chorba: avx512 support not detected
cksum_pclmul_chorba: using pclmul hardware support
4215202376 4294967296 file

real    0m3.148s
user    0m0.419s
sys     0m2.728s
ubuntu@ip-172-31-40-136:~$ time ./cksum_pclmul_chorba --debug file
cksum_pclmul_chorba: avx512 support not detected
cksum_pclmul_chorba: using pclmul hardware support
4215202376 4294967296 file

real    0m2.808s
user    0m0.344s
sys     0m2.463s

cfarm13 (Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz)

pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
4215202376 4294967296 file

real    0m1.103s
user    0m0.436s
sys     0m0.667s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
4215202376 4294967296 file

real    0m1.320s
user    0m0.464s
sys     0m0.855s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
4215202376 4294967296 file

real    0m1.641s
user    0m0.416s
sys     0m1.224s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
4215202376 4294967296 file

real    0m1.714s
user    0m0.496s
sys     0m1.214s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul file
4215202376 4294967296 file

real    0m1.107s
user    0m0.457s
sys     0m0.650s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
4215202376 4294967296 file

real    0m1.091s
user    0m0.485s
sys     0m0.606s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
4215202376 4294967296 file

real    0m1.083s
user    0m0.483s
sys     0m0.600s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
4215202376 4294967296 file

real    0m1.102s
user    0m0.403s
sys     0m0.699s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
4215202376 4294967296 file

real    0m1.081s
user    0m0.412s
sys     0m0.669s
pljeskavica@cfarm13:~/coreutils$ time ./cksum_pclmul_chorba file
4215202376 4294967296 file

real    0m1.077s
user    0m0.412s
sys     0m0.665s

If anyone has an i7 server I can test on I'd be happy to get more results.
I had another change I was working on earlier that's also a 5-10%
improvement that can get lost in the noise of the variance, I can combine
them if we need a stronger improvement to consider taking this change?

On Wed, 25 Dec 2024 at 00:52, Pádraig Brady <p...@draigbrady.com> wrote:

> On 24/12/2024 20:43, Sam Russell wrote:
> > ah sorry, clicked on the wrong patch file, here is the real one
> >
> > On Tue, Dec 24, 2024, 19:36 Pádraig Brady <p...@draigbrady.com <mailto:
> p...@draigbrady.com>> wrote:
> >
> >     On 24/12/2024 16:03, Sam Russell wrote:
> >      > I've released a new paper here https://arxiv.org/abs/2412.16398 <
> https://arxiv.org/abs/2412.16398> and this
> >      > was the easiest algorithm to implement from it. It gets a 5-20%
> speedup for
> >      > SSE/AVX1 and diminishing returns for AVX2/AVX512
> >
> >     Ignoring this as looks applicable to gnulib not coreutils,
> >     and I think you've already landed this in gnulib.
>
> Ah thanks,
> However this is a regression on i7-5600U at least:
>
> $ truncate -s4G file
>
> $ time src/cksum --debug filecksum: avx512 support not detected
> cksum: avx2 support not detected
> cksum: using pclmul hardware support
> 4215202376 4294967296 file
> real    0m1.445s
> user    0m0.250s
> sys     0m1.132s
>
> $ git am < ~/0001-cksum-Implement-Chorba-algorithm-in-PCLMUL.patch
> $ make
>
> $ time src/cksum --debug file
> cksum: avx512 support not detected
> cksum: avx2 support not detected
> cksum: using pclmul hardware support
> 4215202376 4294967296 file
> real    0m1.969s
> user    0m0.263s
> sys     0m1.683s
>
>
> (I've run this a few times, with similar timings).
>
> cheers,
> Pádraig
>

Reply via email to