FYI, I used the wrong mailing list address in my original mail.

---------- Forwarded message ----------
From: Mike Snitzer <[EMAIL PROTECTED]>
Date: Wed, Mar 26, 2008 at 2:43 PM
Subject: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while
sock_xmit() is working on a receive
To: Paul Clements <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], [EMAIL PROTECTED]


I'm seeing that nbd_device's socket is getting set to NULL in the
 middle of nbd_read_stat()'s sock_xmit().

 There appears to be a race where 'nbd-client -d' requests that an NBD
 device first disconnect from the nbd-server (via NBD_DISCONNECT ioctl)
 and then set the NBD device's socket to NULL, etc (via
 NBD_CLEAR_SOCK).

 Both NBD_DISCONNECT and NBD_CLEAR_SOCK take the nbd_device's tx_lock
 (which protects the socket during transmits) _but_ for receives the
 socket can be set to NULL (via NBD_CLEAR_SOCK) at any time while
 inside sock_xmit(); as such NBD_CLEAR_SOCK can cause a NULL pointer in
 sock_xmit().

 Analyzing the crash it is clear that the NULL pointer comes when
 sock_xmit()'s do {} while() dereferences the nbd_device's socket with:
 sock->sk->sk_allocation = GFP_NOIO;
 I also saw that the sock_xmit() caller is nbd_read_stat().

 The sequence looks like this:

 nbd1: NBD_DISCONNECT
 [NOTE: a sock_xmit() send attempt is made on behalf of NBD_DISCONNECT]
 nbd1: Send control failed (result -32)
 ...
 [NBD is still dequeueing requests]
 ...
 Race: [NBD_CLEAR_SOCK ioctl][FATAL: nbd_read_stat()'s sock_xmit()
 receive attempt causes NULL pointer]

 In practice this looks like:

 nbd1: NBD_DISCONNECT
 nbd1: Send control failed (result -32)
 end_request: I/O error, dev nbd1, sector 0
 end_request: I/O error, dev nbd1, sector 8032264
 md: super_written gets error=-5, uptodate=0
 raid1: Disk failure on nbd1, disabling device.
        Operation continuing on 1 devices
 Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
  [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301

 The fact that sock_xmit() in receive mode is unprotected seems to be
 the WHY a NULL pointer is possible; but I'm still trying to identify
 the HOW.

 But for me this begs the question:  why isn't the nbd_device's socket
 always protected during sock_xmit() for both
 transmits and receives; rather than just transmits (via tx_lock)!?

 Any help on the "right" fix would be appreciated, thanks.
 Mike

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply via email to