Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL

Sam Russell Wed, 25 Dec 2024 11:07:52 -0800

I agree, also looking over CPU specs it looks like this is actually going
to be a regression as a lot of 5-10 year old CPUs have 32-64kB L1 cache and
not much more for L2 (whereas AMD is doing 3MB L2 caches which explains the
boost there).


I have some old laptops at home I can play around with so I'll tune on
there and submit again when I have some more confidence on the speed boost

On Wed, Dec 25, 2024, 19:57 Pádraig Brady <[email protected]> wrote:

> On 25/12/2024 16:55, Sam Russell wrote:
> > Thanks for the results, looks like I'll need to get access to some older
> hardware and try some different combinations. There's a few things I can
> tune (loading all 8 values at the start vs loading one per fold, different
> BUFSIZE values), I'd be interested in finding a setup that definitely
> offers an improvement across the board.
> >
> > Did you test this with the first patch or the second patch? At a minimum
> cutting out the final table-based fold should be a consistent ~5%
> improvement on any platform.
>
> It would be good to test chorba without also increasing the buffer size
> so we're comparing just the algorithms.
>
> We can tweak the buffer sizes after,
> though note ioblksize.h is currently set to 256KiB
> so it would be good to be <= that.
>
> cheers,
> Pádraig
>

Re: [PATCH] cksum: Implement Chorba algorithm in PCLMUL

Reply via email to