Re: [PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk()

Alexander Duyck Mon, 28 Sep 2015 07:54:25 -0700

On 09/28/2015 05:26 AM, Jesper Dangaard Brouer wrote:

For practical use-cases it is beneficial to prefetch the next freelist
object in bulk allocation loop.


Micro benchmarking show approx 1 cycle change:

bulk -  prev-patch     -  this patch
    1 -  49 cycles(tsc) - 49 cycles(tsc) - increase in cycles:0
    2 -  30 cycles(tsc) - 31 cycles(tsc) - increase in cycles:1
    3 -  23 cycles(tsc) - 25 cycles(tsc) - increase in cycles:2
    4 -  20 cycles(tsc) - 22 cycles(tsc) - increase in cycles:2
    8 -  18 cycles(tsc) - 19 cycles(tsc) - increase in cycles:1
   16 -  17 cycles(tsc) - 18 cycles(tsc) - increase in cycles:1
   30 -  18 cycles(tsc) - 17 cycles(tsc) - increase in cycles:-1
   32 -  18 cycles(tsc) - 19 cycles(tsc) - increase in cycles:1
   34 -  23 cycles(tsc) - 24 cycles(tsc) - increase in cycles:1
   48 -  21 cycles(tsc) - 22 cycles(tsc) - increase in cycles:1
   64 -  20 cycles(tsc) - 21 cycles(tsc) - increase in cycles:1
  128 -  27 cycles(tsc) - 27 cycles(tsc) - increase in cycles:0
  158 -  30 cycles(tsc) - 30 cycles(tsc) - increase in cycles:0
  250 -  37 cycles(tsc) - 37 cycles(tsc) - increase in cycles:0

Note, benchmark done with slab_nomerge to keep it stable enough
for accurate comparison.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---
  mm/slub.c |    2 ++
  1 file changed, 2 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index c25717ab3b5a..5af75a618b91 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2951,6 +2951,7 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t 
flags, size_t size,
                                goto error;

c = this_cpu_ptr(s->cpu_slab);

+                       prefetch_freepointer(s, c->freelist);
                        continue; /* goto for-loop */
                }

@@ -2960,6 +2961,7 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,

                        goto error;

c->freelist = get_freepointer(s, object);

+               prefetch_freepointer(s, c->freelist);
                p[i] = object;

/* kmem_cache debug support */

I can see the prefetch in the last item case being possibly useful sinceyou have time between when you call the prefetch and when you areaccessing the next object. However, is there any actual benefit toprefetching inside the loop itself? Based on your data above it doesn'tseem like that is the case since you are now adding one additional cycleto the allocation and I am not seeing any actual gain reported here.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk()

Reply via email to