On 24/12/2024 20:43, Sam Russell wrote:
ah sorry, clicked on the wrong patch file, here is the real one

On Tue, Dec 24, 2024, 19:36 Pádraig Brady <p...@draigbrady.com 
<mailto:p...@draigbrady.com>> wrote:

    On 24/12/2024 16:03, Sam Russell wrote:
     > I've released a new paper here https://arxiv.org/abs/2412.16398 
<https://arxiv.org/abs/2412.16398> and this
     > was the easiest algorithm to implement from it. It gets a 5-20% speedup 
for
     > SSE/AVX1 and diminishing returns for AVX2/AVX512

    Ignoring this as looks applicable to gnulib not coreutils,
    and I think you've already landed this in gnulib.

Ah thanks,
However this is a regression on i7-5600U at least:

$ truncate -s4G file

$ time src/cksum --debug filecksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
4215202376 4294967296 file
real    0m1.445s
user    0m0.250s
sys     0m1.132s

$ git am < ~/0001-cksum-Implement-Chorba-algorithm-in-PCLMUL.patch
$ make

$ time src/cksum --debug file
cksum: avx512 support not detected
cksum: avx2 support not detected
cksum: using pclmul hardware support
4215202376 4294967296 file
real    0m1.969s
user    0m0.263s
sys     0m1.683s


(I've run this a few times, with similar timings).

cheers,
Pádraig

Reply via email to