On Sun, Aug 09, 2015 at 07:49:08PM +0200, Manfred Spraul wrote:
> Hi Herton,
> 
(snip)
> >+++ b/ipc/sem.c
> >@@ -2074,17 +2074,24 @@ void exit_sem(struct task_struct *tsk)
> >             rcu_read_lock();
> >             un = list_entry_rcu(ulp->list_proc.next,
> >                                 struct sem_undo, list_proc);
> >-            if (&un->list_proc == &ulp->list_proc)
> >-                    semid = -1;
> >-             else
> >-                    semid = un->semid;
> >+            if (&un->list_proc == &ulp->list_proc) {
> >+                    rcu_read_unlock();
> >+                    /* Make sure we wait for any place still referencing
> >+                     * the current ulp to finish */
> >+                    synchronize_rcu();
> Sorry, no. synchronize_rcu() is a high-latency operation.
> We can't call it within exit_sem(). We could use kfree_rcu(), but I don't
> see that we need it:
> 
> Which race do you imagine?
> ulp is accessed by:
> - freeary(). Race impossible due to explicit locking.
> - exit_sem(). Race impossible due to ulp->refcount
> - find_alloc_undo(). Race impossible, because it operates on
> current->sysvsem.undo_list.
> "current" is in do_exit, thus can't be inside semtimedop().

Well without the synchronize_rcu() and with the semid list loop fix I was still
able to get issues, and I thought the problem is related to racing with IPC_RMID
on freeary again. This is one scenario I would imagine:

               A                                                  B

freeary()
  list_del(&un->list_id)
  spin_lock(&un->ulp->lock)
  un->semid = -1
  list_del_rcu(&un->list_proc)
    __list_del_entry(&un->list_proc)
      __list_del(entry->prev, entry->next)      exit_sem()
        next->prev = prev;                        ...
        prev->next = next;                        ...
        ...                                       un = 
list_entry_rcu(ulp->list_proc.next...)
    (&un->list_proc)->prev = LIST_POISON2         if (&un->list_proc == 
&ulp->list_proc) <true, last un removed by thread A>
  ...                                             kfree(ulp)
  spin_unlock(&un->ulp->lock) <---- bug

Now that is a very tight window, but I had problems even when I tried this patch
first:

(...)
-               if (&un->list_proc == &ulp->list_proc)
-                       semid = -1;
-                else
-                       semid = un->semid;
+               if (&un->list_proc == &ulp->list_proc) {
+                       rcu_read_unlock();
+                       break;
+               }
+               spin_lock(&ulp->lock);
+               semid = un->semid;
+               spin_unlock(&ulp->lock);

+               /* exit_sem raced with IPC_RMID, nothing to do */
                if (semid == -1) {
                        rcu_read_unlock();
-                       break;
+                       synchronize_rcu();
+                       continue;
                }
(...)

So even with the bad/uneeded synchronize_rcu() which I had placed there, I could
still get issues (however the testing on patch above was on an older kernel than
latest upstream, from RHEL 6, I can test without synchronize_rcu() on latest
upstream, however the affected code is the same). That's when I thought of
scenario above. I was able to get this oops:

<0>BUG: spinlock bad magic on CPU#3, test/5427 (Not tainted)
<4>general protection fault: 0000 [#1] SMP
...
<4>Pid: 5427, comm: test Not tainted 2.6.32-573.el6.1233300.2.x86_64.debug #1 
QEMU Standard PC (i440FX + PIIX, 1996)
<4>RIP: 0010:[<ffffffff812c8291>]  [<ffffffff812c8291>] spin_bug+0x81/0x100
...
<4>Call Trace:
<4> [<ffffffff812c8375>] _raw_spin_unlock+0x65/0xa0
<4> [<ffffffff8157087b>] _spin_unlock+0x2b/0x40
<4> [<ffffffff81241a66>] freeary+0x96/0x250
<4> [<ffffffff81570a72>] ? _spin_lock+0x62/0x70
<4> [<ffffffff81241cad>] semctl_down.clone.0+0x8d/0x130
<4> [<ffffffff810af1a8>] ? sched_clock_cpu+0xb8/0x110
<4> [<ffffffff810bdb3d>] ? trace_hardirqs_off+0xd/0x10
<4> [<ffffffff810af2ef>] ? cpu_clock+0x6f/0x80
<4> [<ffffffff810c0c5d>] ? lock_release_holdtime+0x3d/0x190
<4> [<ffffffff81243a42>] sys_semctl+0x2a2/0x300
<4> [<ffffffff815702d2>] ? trace_hardirqs_on_thunk+0x3a/0x3f
<4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b

May be below spin_lock(ulp->lock) could be acquired earlier instead before
checking list_proc being empty? Then this would avoid the need for
synchronize_rcu(), which was my initial idea to avoid the problem.

> 
> >+                    break;
> >+            }
> >+            spin_lock(&ulp->lock);
> >+            semid = un->semid;
> >+            spin_unlock(&ulp->lock);
> Ok/good.
> Note (I've tried it first):
> Just "READ_ONCE(un->semid)" would be insufficient, because this can happen:
>  A: thread 1, within freeary:
>  A: spin_lock(&ulp->lock);
>  A: un->semid = -1;
>  B:    thread 2, within exit_sem():
>  B:    if (un->semid == -1) exit;
>  B:    kfree(ulp);
>  A: spin_unlock(&ulp->lock); <<<< use-after-free, bug
> 
> >+            /* exit_sem raced with IPC_RMID, nothing to do */
> >             if (semid == -1) {
> >                     rcu_read_unlock();
> >-                    break;
> >+                    continue;
> >             }
> >-            sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
> >+            sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, semid);
> >             /* exit_sem raced with IPC_RMID, nothing to do */
> >             if (IS_ERR(sma)) {
> >                     rcu_read_unlock();
> Ok.
> >@@ -2112,9 +2119,10 @@ void exit_sem(struct task_struct *tsk)
> >             ipc_assert_locked_object(&sma->sem_perm);
> >             list_del(&un->list_id);
> >-            spin_lock(&ulp->lock);
> >+            /* we should be the last process using this ulp, so no need
> >+             * to acquire ulp->lock here; we are also protected against
> >+             * IPC_RMID as we hold sma->sem_perm.lock */
> >             list_del_rcu(&un->list_proc);
> >-            spin_unlock(&ulp->lock);
> >             /* perform adjustments registered in un */
> >             for (i = 0; i < sma->sem_nsems; i++) {
> a) "we should be the last" or "we are the last"?

I'll change the comment, as "we are the last".

> b) The bug that you have found is probably old, thus it must go into the
> stable kernels as well.
> I would not do this change together with the bugfix.
> 
> Perhaps make two patches? One cc stable, the other one without cc stable.

Ok, I'll separate the changes.

Thanks for reviewing.

> --
>     Manfred

-- 
[]'s
Herton
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to