On 24/12/2024 20:43, Sam Russell wrote:
ah sorry, clicked on the wrong patch file, here is the real one
On Tue, Dec 24, 2024, 19:36 Pádraig Brady <p...@draigbrady.com
<mailto:p...@draigbrady.com>> wrote:
On 24/12/2024 16:03, Sam Russell wrote:
> I've released a new paper here https://arxiv.org/abs/2412.16398
<https://arxiv.org/abs/2412.16398> and this
> was the easiest algorithm to implement from it. It gets a 5-20% speedup
for
> SSE/AVX1 and diminishing returns for AVX2/AVX512
Ignoring this as looks applicable to gnulib not coreutils,
and I think you've already landed this in gnulib.
Ah thanks,
However this is a regression on i7-5600U at least:
$ truncate -s4G file
$ time src/cksum --debug filecksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
4215202376 4294967296 file
real 0m1.445s
user 0m0.250s
sys 0m1.132s
$ git am < ~/0001-cksum-Implement-Chorba-algorithm-in-PCLMUL.patch
$ make
$ time src/cksum --debug file
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
4215202376 4294967296 file
real 0m1.969s
user 0m0.263s
sys 0m1.683s
(I've run this a few times, with similar timings).
cheers,
Pádraig