On Tue, Mar 29, 2005 at 01:49:26PM +0200, Ingo Molnar wrote:
> 
> (i guess the debug message should be extended to do a dump_stack() so 
> that we see which process does?)

Never mind.  I think I've found what it is.  The only thing I can't
figure out is why we're only seeing it now when this bug has been
around since day one.

In netlink_dump we're operating on sk after dropping the cb lock.
This is racy because the owner of the socket could close it after
we drop the cb lock.

This is possible because netlink_dump isn't always called from the
context of the process that owns the socket.  For instance, if there
is contention on rtnl then rtnetlink requests will be processed by
the process that owns the rtnl.

The solution is to hold a ref count on the socket before we drop
the cb lock.

Signed-off-by: Herbert Xu <[EMAIL PROTECTED]>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
===== net/netlink/af_netlink.c 1.72 vs edited =====
--- 1.72/net/netlink/af_netlink.c       2005-03-23 14:17:09 +11:00
+++ edited/net/netlink/af_netlink.c     2005-03-30 16:24:27 +10:00
@@ -1080,9 +1080,11 @@
        len = cb->dump(skb, cb);
 
        if (len > 0) {
+               sock_hold(sk);
                spin_unlock(&nlk->cb_lock);
                skb_queue_tail(&sk->sk_receive_queue, skb);
                sk->sk_data_ready(sk, len);
+               sock_put(sk);
                return 0;
        }
 

Reply via email to