Re: ucred holding patch, BDE version

John Baldwin Mon, 11 Feb 2002 20:33:40 -0800


On 12-Feb-02 Terry Lambert wrote:
> Then the combination of calls fails.  There is no case where
> it will succeed when previously it would not.  I would argue
> that the failure is a lost race condition that exists in the
> code anyway, and that this approach merely makes the race
> failure more reliably reproducible.  8-).
> 
> I think you could "fix" this case, in any event, by saving a
> reference to the credential at the start (or, better, as a
> result of the sleep, when you know it's going to happen),
> any time you are going to split a system call across multiple
> potential blocking operations.  That certainly is *NOT* as
> common to all system calls as this discussion implies.
> 
> 
>> The idea of per-thread ucred references is so that a thread will have the
>> same
>> credentials for the lifetime of a syscall so that the entire syscall is self
>> consistent.  It also means that except for when we update the ucred
>> reference,
>> we don't need locks to access thread credentials since the thread references
>> are to read-only credentials.  We discussed this on -arch _months_ ago and
>> everyone agreed with it then.
> 
> As long as the pointer updates are atomic, you don't need to
> hold a lock to reference it anyway.  You don't care over the
> update, since it's a drop of priviledge.


Ok, I'll use really small words and diagrams:

thread's 1 and 2 belong to process p
p->p_ucred starts pointing at memory X
CPU 1 is executing thread 1
CPU 2 is executing thread 2

CPU 3 is executing some other random code in the kernel
At start both thread 1 and thread 2 have td_ucred == X

CPU 1                   CPU 2                   CPU 3
seteuid(), p_ucred      in userland
now points to Y
...
seteuid(), p_ucred
now points to Z
...
some syscall, Y's
only remaining
ref is free'd
...                                             allocates some
                                                other kernel
                                                structure at Y
                        executes any syscall
                        because on non-i386
                        arch's, the value of
                        p_ucred may still be
                        sitting in the store
                        buffer and we aren't
                        using a lock, we may
                        get any of the values
                        X, Y, or Z for p_ucred
                        when we test to see
                        if it needs updating.
                        If we get X, we keep
                        using the old reference
                        (this is the acceptable
                        race from before).  If
                        we get Z, then we gain
                        a reference to the new
                        ucred.  If we get Y,
                        then we increment the
                        refcount of the ucred
                        formerly at Y, in fact
                        corrupting the memory
                        allocated by CPU 3!

                                                *BOOM*

Is that simple enough?

>> > Perhaps this dicsussion is enough impetus to justify
>> > revisiting the atomic_t type definitions, which would
>> > be useful as reference counted hold/release mechanisms
>> > that would obviate the need for locks here?  This would
>> > at least let you defer getting rid of the per thread
>> > cred instances until later.
>> 
>> All having the refcount_t and other refcount_* functions would do is let us
>> get
>> rid of the per-ucred mutex (actually a pool mutex, so the overhead per ucred
>> is
>> just a pointer right now).  It wouln't change the fact that we need a lock
>> to
>> make sure p_ucred is up to date before we read a value we need to depend on
>> or
>> actually use.
> 
> Not true.  You can take a reference to it, and the reference
> is guaranteed to not change out from under you, so long as
> it is counted.  Use of a stack variable to store the value
> over the count increment allows a compare after the increment,
> after which a retry is possible (if there was a race during
> the reference taking).  Thus there is no need for a lock, e.g.:
> 
>       cred *
>       addreftocred(cred **cpp)
>       {
>               cred *rcp;      /* stack variable */
> 
>       again:
>               rcp = *cp;
>               atomic_plusplus( rcp->count);   /* magic atomic crap */
>               if( rcp != *cp) {
>                       atomic_mimusminus( rcp->count);
>                       goto again;
>               }
> 
>               return( rcp);
>       }
> 
> This depends on the rarity of credential changes; in the
> degenerate case, where there is a credential change between
> each normal system call that is not a credential change, the
> overhead is immense.  But people who write code like that
> can be safely punished without fear.  8-).

This code is still broken.  How do you know that the value you are reading from
cp (well, cpp, looks like you have a typo :-P) isn't stale and that your
increment isn't corrupting something?  Esp. since you are using atomic ops that
means your increment might corrupt somethign long enough to become visible to
another CPU and really hose things.

There is a race condition in between that plusplus and minusminus (albeit a
small one) that is just _begging_ for a kernel panic.

> -- Terry

-- 

John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Re: ucred holding patch, BDE version

Reply via email to