On Fri, Feb 23, 2018 at 05:42:16PM +0000, Al Viro wrote:

>       4) the nasty one - shrink_dentry_list() evictions of zero-count 
> dentries.
> _That_ calls for careful use of RCU, etc. - none of the others need that.  
> Need
> to think how to deal with that sucker; in any case, I do not believe that 
> sharing
> said RCU use, etc. with any other cases would do anything other than 
> obfuscating
> the rest.

Arrrgh...  Actually, there's a nasty corner case where the variant in mainline 
broken.  Look:
        dentry placed on a shrink list
        we pick the fucker from the list and lock it.
        we call lock_parent() on it.
                dentry is not a root and it's not deleted, so we proceed.
                trylock fails.
                we grab rcu_read_lock()
                we drop dentry->d_lock
        on another CPU, something does e.g. d_prune_aliases() (or finds the
        sucker in hash and proceeds to unhash and dput(), etc.) - anything
        that evicts that dentry.  It is marked with DCACHE_MAY_FREE and left
        alone.  The parent, OTOH, is dropped and freeing gets scheduled as
        soon as RCU allows.
                we grab parent->d_lock
                we verify that dentry->d_parent is still the same (it is)
                we do rcu_read_unlock()
                we grab dentry->d_lock
                we return parent
At that point we are fucked - there's nothing to prevent parent from being
freed at any point.  And we assume that its ->d_lock is held and needs to
be dropped.

The call site in d_prune_aliases() avoids the same scenario, since there we
are already holding ->i_lock and another thread won't get to __dentry_kill()
until we are done with lock_parent().

Unless I'm missing something, that's a (narrow) memory corruptor.  The window is
narrow, but not impossibly so - if that other thread had been spinning on 
to grab dentry->d_lock in d_prune_alias(), it has to squeeze through
                if (!dentry->d_lockref.count) {
and then in lock_parent() called there - through
        if (IS_ROOT(dentry))
                return NULL;
        if (unlikely(dentry->d_lockref.count < 0))
                return NULL;
        if (likely(spin_trylock(&parent->d_lock)))
before the first CPU gets through
        parent = READ_ONCE(dentry->d_parent);

The first CPU can't be preempted, but there's nothing to prevent an IRQ arriving
at that point, letting the second one win the race.


I think the (untested) patch below is -stable fodder:

lock_parent() needs to recheck if dentry got __dentry_kill'ed under it

In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent.  We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point.  Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.

Signed-off-by: Al Viro <v...@zeniv.linux.org.uk>
diff --git a/fs/dcache.c b/fs/dcache.c
index 7c38f39958bc..32aaab21e648 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -647,11 +647,16 @@ static inline struct dentry *lock_parent(struct dentry 
                goto again;
-       rcu_read_unlock();
-       if (parent != dentry)
+       if (parent != dentry) {
                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
-       else
+               if (unlikely(dentry->d_lockref.count < 0)) {
+                       spin_unlock(&parent->d_lock);
+                       parent = NULL;
+               }
+       } else {
                parent = NULL;
+       }
+       rcu_read_unlock();
        return parent;

Reply via email to