On 08/30/2013 07:44 PM, Gleb Natapov wrote:
> On Thu, Aug 29, 2013 at 08:02:30PM +0800, Xiao Guangrong wrote:
>> On 08/29/2013 07:33 PM, Xiao Guangrong wrote:
>>> On 08/29/2013 05:31 PM, Gleb Natapov wrote:
>>>> On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote:
>>>>> After more thinking, I still think rcu_assign_pointer() is unneeded when 
>>>>> a entry
>>>>> is removed. The remove-API does not care the order between unlink the 
>>>>> entry and
>>>>> the changes to its fields. It is the caller's responsibility:
>>>>> - in the case of rcuhlist, the caller uses call_rcu()/synchronize_rcu(), 
>>>>> etc to
>>>>>   enforce all lookups exit and the later change on that entry is 
>>>>> invisible to the
>>>>>   lookups.
>>>>>
>>>>> - In the case of rculist_nulls, it seems refcounter is used to guarantee 
>>>>> the order
>>>>>   (see the example from Documentation/RCU/rculist_nulls.txt).
>>>>>
>>>>> - In our case, we allow the lookup to see the deleted desc even if it is 
>>>>> in slab cache
>>>>>   or its is initialized or it is re-added.
>>>>>
>>>> BTW is it a good idea? We can access deleted desc while it is allocated
>>>> and initialized to zero by kmem_cache_zalloc(), are we sure we cannot
>>>> see partially initialized desc->sptes[] entry? On related note what about
>>>> 32 bit systems, they do not have atomic access to desc->sptes[].
>>
>> Ah... wait. desc is a array of pointers:
>>
>> struct pte_list_desc {
>>      u64 *sptes[PTE_LIST_EXT];
>>      struct pte_list_desc *more;
>> };
>>
> Yep, I misread it to be u64 bits and wondered why do we use u64 to store
> pointers.
> 
>> assigning a pointer is aways aotomic, but we should carefully initialize it
>> as you said. I will introduce a constructor for desc slab cache which 
>> initialize
>> the struct like this:
>>
>> for (i = 0; i < PTE_LIST_EXT; i++)
>>      desc->sptes[i] = NULL;
>>
>> It is okay.
>>
> I hope slab does not write anything into allocated memory internally if
> constructor is present. 

If only constructor is present (no SLAB_DESTROY_BY_RCU), It'll temporarily
write the "poison" value into the memory then call the constructor to initialize
it again, e.g, in slab.c:

static void *cache_alloc_debugcheck_after(struct kmem_cache *cachep,
                                gfp_t flags, void *objp, unsigned long caller)
{
                if (cachep->flags & SLAB_POISON) {
                ......
                poison_obj(cachep, objp, POISON_INUSE);
                }
        ......
        if (cachep->ctor && cachep->flags & SLAB_POISON)
                cachep->ctor(objp);
}

But SLAB_DESTROY_BY_RCU can force the allocer to don't touch
the memory, this is true in our case.

> BTW do you know what happens when SLAB debug is enabled
> and SLAB_DESTROY_BY_RCU is set? 

When SLAB debug is enabled, these 3 flags may be set:
#define SLAB_DEBUG_FREE         0x00000100UL    /* DEBUG: Perform (expensive) 
checks on free */
#define SLAB_RED_ZONE           0x00000400UL    /* DEBUG: Red zone objs in a 
cache */
#define SLAB_POISON             0x00000800UL    /* DEBUG: Poison objects */

Only SLAB_POISON can write something into the memory, and ...

> Does poison value is written into freed
> object (freed to slab, but not yet to page allocator)?

SLAB_POISON is cleared if SLAB_DESTROY_BY_RCU is used.
- In slab,  There is the code in __kmem_cache_create():
        if (flags & SLAB_DESTROY_BY_RCU)
                BUG_ON(flags & SLAB_POISON);

- In slub, the code is in calculate_sizes():
        /*
         * Determine if we can poison the object itself. If the user of
         * the slab may touch the object after free or before allocation
         * then we should never poison the object itself.
         */
        if ((flags & SLAB_POISON) && !(flags & SLAB_DESTROY_BY_RCU) &&
                        !s->ctor)
                s->flags |= __OBJECT_POISON;
        else
                s->flags &= ~__OBJECT_POISON;

- in slob, it seems it does not support SLAB DEBUG.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to