On Wed, Oct 18, 2017 at 5:31 PM, Timofey Titovets <nefelim...@gmail.com> wrote: >> +static int zswap_is_page_same_filled(void *ptr, unsigned long *value) >> +{ >> + unsigned int pos; >> + unsigned long *page; >> + >> + page = (unsigned long *)ptr; >> + for (pos = 1; pos < PAGE_SIZE / sizeof(*page); pos++) { >> + if (page[pos] != page[0]) >> + return 0; >> + } >> + *value = page[0]; >> + return 1; >> +} >> + > > In theory you can speedup that check by memcmp(), > And do something like first: > memcmp(ptr, ptr + PAGE_SIZE/sizeof(*page)/2, PAGE_SIZE/2); > After compare 1/4 with 2/4 > Then 1/8 with 2/8. > And after do you check with pattern, only on first 512 bytes. > > Just because memcmp() on fresh CPU are crazy fast. > That can easy make you check less expensive.
I did check this, and it is actually significantly worse; keep in mind that doing it ^ way may is a smaller loop, but is actually doing more memory comparisons. > >> +static void zswap_fill_page(void *ptr, unsigned long value) >> +{ >> + unsigned int pos; >> + unsigned long *page; >> + >> + page = (unsigned long *)ptr; >> + if (value == 0) >> + memset(page, 0, PAGE_SIZE); >> + else { >> + for (pos = 0; pos < PAGE_SIZE / sizeof(*page); pos++) >> + page[pos] = value; >> + } >> +} > > Same here, but with memcpy(). > > P.S. > I'm just too busy to make fast performance test in user space, > but my recent experience with that CPU commands, show what that make a sense: > KSM patch: https://patchwork.kernel.org/patch/9980803/ > User space tests: https://github.com/Nefelim4ag/memcmpe > PAGE_SIZE: 65536, loop count: 1966080 > memcmp: -28 time: 3216 ms, th: 40064.644611 MiB/s > memcmpe: -28, offset: 62232 time: 3588 ms, th: 35902.462390 MiB/s > memcmpe: -28, offset: 62232 time: 71 ms, th: 1792233.164286 MiB/s > > IIRC, with code like our, you must see ~2.5GiB/s > > Thanks. > -- > Have a nice day, > Timofey.