Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Christoph Lameter
On Mon, 27 Oct 2014, Joonsoo Kim wrote:

> > One other aspect of this patchset is that it reduces the cache footprint
> > of the alloc and free functions. This typically results in a performance
> > increase for the allocator. If we can avoid the page_address() and
> > virt_to_head_page() stuff that is required because we drop the ->page
> > field in a sufficient number of places then this may be a benefit that
> > goes beyond the RT and CONFIG_PREEMPT case.
>
> Yeah... if we can avoid those function calls, it would be good.

One trick that may be possible is to have an address mask for the
page_address. If a pointer satisfies the mask requuirements then its on
the right page and we do not need to do virt_to_head_page.

> But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
> machine, and, that means just 1 cacheline. Reducing size of struct may have
> no remarkable performance benefit in this case.

Hmmm... If we also drop the partial field then a 64 byte cacheline would
fit kmem_cache_cpu structs from 4 caches. If we place them correctly then
the frequently used caches could avoid fetching up to 3 cachelines.

You are right just dropping ->page wont do anything since the
kmem_cache_cpu struct is aligned to a double word boundary.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Joonsoo Kim
On Fri, Oct 24, 2014 at 09:41:49AM -0500, Christoph Lameter wrote:
> > I found that you said retrieving tid first is sufficient to do
> > things right in old discussion. :)
> 
> Right but the tid can be obtained from a different processor.
> 
> 
> One other aspect of this patchset is that it reduces the cache footprint
> of the alloc and free functions. This typically results in a performance
> increase for the allocator. If we can avoid the page_address() and
> virt_to_head_page() stuff that is required because we drop the ->page
> field in a sufficient number of places then this may be a benefit that
> goes beyond the RT and CONFIG_PREEMPT case.

Yeah... if we can avoid those function calls, it would be good.

But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
machine, and, that means just 1 cacheline. Reducing size of struct may have
no remarkable performance benefit in this case.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Joonsoo Kim
On Fri, Oct 24, 2014 at 09:02:18AM -0500, Christoph Lameter wrote:
> On Fri, 24 Oct 2014, Joonsoo Kim wrote:
> 
> > In this case, object from cpu1's cpu_cache should be
> > different with cpu0's, so allocation would be failed.
> 
> That is true for most object pointers unless the value is NULL. Which it
> can be. But if this is the only case then the second patch + your approach
> would work too.

Indeed... I missed the null value case.
Your second patch + mine would fix that situation, but, I need more
thinking. :)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Joonsoo Kim
On Fri, Oct 24, 2014 at 09:02:18AM -0500, Christoph Lameter wrote:
 On Fri, 24 Oct 2014, Joonsoo Kim wrote:
 
  In this case, object from cpu1's cpu_cache should be
  different with cpu0's, so allocation would be failed.
 
 That is true for most object pointers unless the value is NULL. Which it
 can be. But if this is the only case then the second patch + your approach
 would work too.

Indeed... I missed the null value case.
Your second patch + mine would fix that situation, but, I need more
thinking. :)

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Joonsoo Kim
On Fri, Oct 24, 2014 at 09:41:49AM -0500, Christoph Lameter wrote:
  I found that you said retrieving tid first is sufficient to do
  things right in old discussion. :)
 
 Right but the tid can be obtained from a different processor.
 
 
 One other aspect of this patchset is that it reduces the cache footprint
 of the alloc and free functions. This typically results in a performance
 increase for the allocator. If we can avoid the page_address() and
 virt_to_head_page() stuff that is required because we drop the -page
 field in a sufficient number of places then this may be a benefit that
 goes beyond the RT and CONFIG_PREEMPT case.

Yeah... if we can avoid those function calls, it would be good.

But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
machine, and, that means just 1 cacheline. Reducing size of struct may have
no remarkable performance benefit in this case.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-27 Thread Christoph Lameter
On Mon, 27 Oct 2014, Joonsoo Kim wrote:

  One other aspect of this patchset is that it reduces the cache footprint
  of the alloc and free functions. This typically results in a performance
  increase for the allocator. If we can avoid the page_address() and
  virt_to_head_page() stuff that is required because we drop the -page
  field in a sufficient number of places then this may be a benefit that
  goes beyond the RT and CONFIG_PREEMPT case.

 Yeah... if we can avoid those function calls, it would be good.

One trick that may be possible is to have an address mask for the
page_address. If a pointer satisfies the mask requuirements then its on
the right page and we do not need to do virt_to_head_page.

 But, current struct kmem_cache_cpu occupies just 32 bytes on 64 bits
 machine, and, that means just 1 cacheline. Reducing size of struct may have
 no remarkable performance benefit in this case.

Hmmm... If we also drop the partial field then a 64 byte cacheline would
fit kmem_cache_cpu structs from 4 caches. If we place them correctly then
the frequently used caches could avoid fetching up to 3 cachelines.

You are right just dropping -page wont do anything since the
kmem_cache_cpu struct is aligned to a double word boundary.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-24 Thread Christoph Lameter
> I found that you said retrieving tid first is sufficient to do
> things right in old discussion. :)

Right but the tid can be obtained from a different processor.


One other aspect of this patchset is that it reduces the cache footprint
of the alloc and free functions. This typically results in a performance
increase for the allocator. If we can avoid the page_address() and
virt_to_head_page() stuff that is required because we drop the ->page
field in a sufficient number of places then this may be a benefit that
goes beyond the RT and CONFIG_PREEMPT case.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-24 Thread Christoph Lameter
On Fri, 24 Oct 2014, Joonsoo Kim wrote:

> In this case, object from cpu1's cpu_cache should be
> different with cpu0's, so allocation would be failed.

That is true for most object pointers unless the value is NULL. Which it
can be. But if this is the only case then the second patch + your approach
would work too.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-24 Thread Christoph Lameter
On Fri, 24 Oct 2014, Joonsoo Kim wrote:

 In this case, object from cpu1's cpu_cache should be
 different with cpu0's, so allocation would be failed.

That is true for most object pointers unless the value is NULL. Which it
can be. But if this is the only case then the second patch + your approach
would work too.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-24 Thread Christoph Lameter
 I found that you said retrieving tid first is sufficient to do
 things right in old discussion. :)

Right but the tid can be obtained from a different processor.


One other aspect of this patchset is that it reduces the cache footprint
of the alloc and free functions. This typically results in a performance
increase for the allocator. If we can avoid the page_address() and
virt_to_head_page() stuff that is required because we drop the -page
field in a sufficient number of places then this may be a benefit that
goes beyond the RT and CONFIG_PREEMPT case.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Joonsoo Kim
On Thu, Oct 23, 2014 at 09:18:29AM -0500, Christoph Lameter wrote:
> On Thu, 23 Oct 2014, Joonsoo Kim wrote:
> 
> > Preemption disable during very short code would cause large problem for RT?
> 
> This is the hotpath and preempt enable/disable adds a significant number
> of cycles.
> 
> > And, if page_address() and virt_to_head_page() remain as current patchset
> > implementation, this would work worse than before.
> 
> Right.
> 
> > I looked at the patchset quickly and found another idea to remove
> > preemption disable. How about just retrieving s->cpu_slab->tid first,
> > before accessing s->cpu_slab, in slab_alloc() and slab_free()?
> > Retrieved tid may ensure that we aren't migrated to other CPUs so that
> > we can remove code for preemption disable.
> 
> You cannot do any of these things because you need the tid from the right
> cpu and the scheduler can prempt you and reschedule you on another
> processor at will. tid and c may be from different per cpu areas.

I found that you said retrieving tid first is sufficient to do
things right in old discussion. :)

https://lkml.org/lkml/2013/1/18/430

Think about following 4 examples.

TID CPU_CACHE CMPX_DOUBLE
1. cpu0 cpu0 cpu0
2. cpu0 cpu0 cpu1
3. cpu0 cpu1 cpu0
4. cpu0 cpu1 cpu1

1) has no problem and will succeed.
2, 4) would be failed due to tid mismatch.
Only complicated case is scenario 3).

In this case, object from cpu1's cpu_cache should be
different with cpu0's, so allocation would be failed.

Only problem of this method is that it's not easy to understand.

Am I missing something?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Christoph Lameter
On Thu, 23 Oct 2014, Joonsoo Kim wrote:

> Preemption disable during very short code would cause large problem for RT?

This is the hotpath and preempt enable/disable adds a significant number
of cycles.

> And, if page_address() and virt_to_head_page() remain as current patchset
> implementation, this would work worse than before.

Right.

> I looked at the patchset quickly and found another idea to remove
> preemption disable. How about just retrieving s->cpu_slab->tid first,
> before accessing s->cpu_slab, in slab_alloc() and slab_free()?
> Retrieved tid may ensure that we aren't migrated to other CPUs so that
> we can remove code for preemption disable.

You cannot do any of these things because you need the tid from the right
cpu and the scheduler can prempt you and reschedule you on another
processor at will. tid and c may be from different per cpu areas.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Joonsoo Kim
On Wed, Oct 22, 2014 at 10:55:17AM -0500, Christoph Lameter wrote:
> We had to insert a preempt enable/disable in the fastpath a while ago. This
> was mainly due to a lot of state that is kept to be allocating from the per
> cpu freelist. In particular the page field is not covered by
> this_cpu_cmpxchg used in the fastpath to do the necessary atomic state
> change for fast path allocation and freeing.
> 
> This patch removes the need for the page field to describe the state of the
> per cpu list. The freelist pointer can be used to determine the page struct
> address if necessary.
> 
> However, currently this does not work for the termination value of a list
> which is NULL and the same for all slab pages. If we use a valid pointer
> into the page as well as set the last bit then all freelist pointers can
> always be used to determine the address of the page struct and we will not
> need the page field anymore in the per cpu are for a slab. Testing for the
> end of the list is a test if the first bit is set.
> 
> So the first patch changes the termination pointer for freelists to do just
> that. The second removes the page field and then third can then remove the
> preempt enable/disable.
> 
> There are currently a number of caveats because we are adding calls to
> page_address() and virt_to_head_page() in a number of code paths. These
> can hopefully be removed one way or the other.
> 
> Removing the ->page field reduces the cache footprint of the fastpath so 
> hopefully overall
> allocator effectiveness will increase further. Also RT uses full preemption 
> which means
> that currently pretty expensive code has to be inserted into the fastpath. 
> This approach
> allows the removal of that code and a corresponding performance increase.
> 

Hello, Christoph.

Preemption disable during very short code would cause large problem for RT?

And, if page_address() and virt_to_head_page() remain as current patchset
implementation, this would work worse than before.

I looked at the patchset quickly and found another idea to remove
preemption disable. How about just retrieving s->cpu_slab->tid first,
before accessing s->cpu_slab, in slab_alloc() and slab_free()?
Retrieved tid may ensure that we aren't migrated to other CPUs so that
we can remove code for preemption disable.

Following is the patch implementing above idea.

Thanks.

->8
diff --git a/mm/slub.c b/mm/slub.c
index ae7b9f1..af622d8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2386,28 +2386,21 @@ static __always_inline void *slab_alloc_node(struct 
kmem_cache *s,
s = memcg_kmem_get_cache(s, gfpflags);
 redo:
/*
+* The transaction ids are globally unique per cpu and per operation on
+* a per cpu queue. Thus they can be guarantee that the cmpxchg_double
+* occurs on the right processor and that there was no operation on the
+* linked list in between.
+*/
+   tid = this_cpu_read(s->cpu_slab->tid);
+
+   /*
 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 * enabled. We may switch back and forth between cpus while
 * reading from one cpu area. That does not matter as long
 * as we end up on the original cpu again when doing the cmpxchg.
-*
-* Preemption is disabled for the retrieval of the tid because that
-* must occur from the current processor. We cannot allow rescheduling
-* on a different processor between the determination of the pointer
-* and the retrieval of the tid.
 */
-   preempt_disable();
c = this_cpu_ptr(s->cpu_slab);
 
-   /*
-* The transaction ids are globally unique per cpu and per operation on
-* a per cpu queue. Thus they can be guarantee that the cmpxchg_double
-* occurs on the right processor and that there was no operation on the
-* linked list in between.
-*/
-   tid = c->tid;
-   preempt_enable();
-
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
@@ -2646,18 +2639,16 @@ static __always_inline void slab_free(struct kmem_cache 
*s,
slab_free_hook(s, x);
 
 redo:
+   tid = this_cpu_read(s->cpu_slab->tid);
+
/*
 * Determine the currently cpus per cpu slab.
 * The cpu may change afterward. However that does not matter since
 * data is retrieved via this pointer. If we are on the same cpu
 * during the cmpxchg then the free will succedd.
 */
-   preempt_disable();
c = this_cpu_ptr(s->cpu_slab);
 
-   tid = c->tid;
-   preempt_enable();
-
if (likely(page == c->page)) {
set_freepointer(s, object, c->freelist);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read 

Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Joonsoo Kim
On Wed, Oct 22, 2014 at 10:55:17AM -0500, Christoph Lameter wrote:
 We had to insert a preempt enable/disable in the fastpath a while ago. This
 was mainly due to a lot of state that is kept to be allocating from the per
 cpu freelist. In particular the page field is not covered by
 this_cpu_cmpxchg used in the fastpath to do the necessary atomic state
 change for fast path allocation and freeing.
 
 This patch removes the need for the page field to describe the state of the
 per cpu list. The freelist pointer can be used to determine the page struct
 address if necessary.
 
 However, currently this does not work for the termination value of a list
 which is NULL and the same for all slab pages. If we use a valid pointer
 into the page as well as set the last bit then all freelist pointers can
 always be used to determine the address of the page struct and we will not
 need the page field anymore in the per cpu are for a slab. Testing for the
 end of the list is a test if the first bit is set.
 
 So the first patch changes the termination pointer for freelists to do just
 that. The second removes the page field and then third can then remove the
 preempt enable/disable.
 
 There are currently a number of caveats because we are adding calls to
 page_address() and virt_to_head_page() in a number of code paths. These
 can hopefully be removed one way or the other.
 
 Removing the -page field reduces the cache footprint of the fastpath so 
 hopefully overall
 allocator effectiveness will increase further. Also RT uses full preemption 
 which means
 that currently pretty expensive code has to be inserted into the fastpath. 
 This approach
 allows the removal of that code and a corresponding performance increase.
 

Hello, Christoph.

Preemption disable during very short code would cause large problem for RT?

And, if page_address() and virt_to_head_page() remain as current patchset
implementation, this would work worse than before.

I looked at the patchset quickly and found another idea to remove
preemption disable. How about just retrieving s-cpu_slab-tid first,
before accessing s-cpu_slab, in slab_alloc() and slab_free()?
Retrieved tid may ensure that we aren't migrated to other CPUs so that
we can remove code for preemption disable.

Following is the patch implementing above idea.

Thanks.

-8
diff --git a/mm/slub.c b/mm/slub.c
index ae7b9f1..af622d8 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2386,28 +2386,21 @@ static __always_inline void *slab_alloc_node(struct 
kmem_cache *s,
s = memcg_kmem_get_cache(s, gfpflags);
 redo:
/*
+* The transaction ids are globally unique per cpu and per operation on
+* a per cpu queue. Thus they can be guarantee that the cmpxchg_double
+* occurs on the right processor and that there was no operation on the
+* linked list in between.
+*/
+   tid = this_cpu_read(s-cpu_slab-tid);
+
+   /*
 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 * enabled. We may switch back and forth between cpus while
 * reading from one cpu area. That does not matter as long
 * as we end up on the original cpu again when doing the cmpxchg.
-*
-* Preemption is disabled for the retrieval of the tid because that
-* must occur from the current processor. We cannot allow rescheduling
-* on a different processor between the determination of the pointer
-* and the retrieval of the tid.
 */
-   preempt_disable();
c = this_cpu_ptr(s-cpu_slab);
 
-   /*
-* The transaction ids are globally unique per cpu and per operation on
-* a per cpu queue. Thus they can be guarantee that the cmpxchg_double
-* occurs on the right processor and that there was no operation on the
-* linked list in between.
-*/
-   tid = c-tid;
-   preempt_enable();
-
object = c-freelist;
page = c-page;
if (unlikely(!object || !node_match(page, node))) {
@@ -2646,18 +2639,16 @@ static __always_inline void slab_free(struct kmem_cache 
*s,
slab_free_hook(s, x);
 
 redo:
+   tid = this_cpu_read(s-cpu_slab-tid);
+
/*
 * Determine the currently cpus per cpu slab.
 * The cpu may change afterward. However that does not matter since
 * data is retrieved via this pointer. If we are on the same cpu
 * during the cmpxchg then the free will succedd.
 */
-   preempt_disable();
c = this_cpu_ptr(s-cpu_slab);
 
-   tid = c-tid;
-   preempt_enable();
-
if (likely(page == c-page)) {
set_freepointer(s, object, c-freelist);
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Christoph Lameter
On Thu, 23 Oct 2014, Joonsoo Kim wrote:

 Preemption disable during very short code would cause large problem for RT?

This is the hotpath and preempt enable/disable adds a significant number
of cycles.

 And, if page_address() and virt_to_head_page() remain as current patchset
 implementation, this would work worse than before.

Right.

 I looked at the patchset quickly and found another idea to remove
 preemption disable. How about just retrieving s-cpu_slab-tid first,
 before accessing s-cpu_slab, in slab_alloc() and slab_free()?
 Retrieved tid may ensure that we aren't migrated to other CPUs so that
 we can remove code for preemption disable.

You cannot do any of these things because you need the tid from the right
cpu and the scheduler can prempt you and reschedule you on another
processor at will. tid and c may be from different per cpu areas.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 0/4] [RFC] slub: Fastpath optimization (especially for RT)

2014-10-23 Thread Joonsoo Kim
On Thu, Oct 23, 2014 at 09:18:29AM -0500, Christoph Lameter wrote:
 On Thu, 23 Oct 2014, Joonsoo Kim wrote:
 
  Preemption disable during very short code would cause large problem for RT?
 
 This is the hotpath and preempt enable/disable adds a significant number
 of cycles.
 
  And, if page_address() and virt_to_head_page() remain as current patchset
  implementation, this would work worse than before.
 
 Right.
 
  I looked at the patchset quickly and found another idea to remove
  preemption disable. How about just retrieving s-cpu_slab-tid first,
  before accessing s-cpu_slab, in slab_alloc() and slab_free()?
  Retrieved tid may ensure that we aren't migrated to other CPUs so that
  we can remove code for preemption disable.
 
 You cannot do any of these things because you need the tid from the right
 cpu and the scheduler can prempt you and reschedule you on another
 processor at will. tid and c may be from different per cpu areas.

I found that you said retrieving tid first is sufficient to do
things right in old discussion. :)

https://lkml.org/lkml/2013/1/18/430

Think about following 4 examples.

TID CPU_CACHE CMPX_DOUBLE
1. cpu0 cpu0 cpu0
2. cpu0 cpu0 cpu1
3. cpu0 cpu1 cpu0
4. cpu0 cpu1 cpu1

1) has no problem and will succeed.
2, 4) would be failed due to tid mismatch.
Only complicated case is scenario 3).

In this case, object from cpu1's cpu_cache should be
different with cpu0's, so allocation would be failed.

Only problem of this method is that it's not easy to understand.

Am I missing something?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/