Mike Snitzer wrote: > In practice this looks like: > > nbd1: NBD_DISCONNECT > nbd1: Send control failed (result -32) > end_request: I/O error, dev nbd1, sector 0 > end_request: I/O error, dev nbd1, sector 8032264 > md: super_written gets error=-5, uptodate=0 > raid1: Disk failure on nbd1, disabling device. > Operation continuing on 1 devices > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: > [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301
> The fact that sock_xmit() in receive mode is unprotected seems to be > the WHY a NULL pointer is possible; but I'm still trying to identify > the HOW. Do you know who is setting the socket NULL? Is it already NULL when you get to this point? Is it the nbd-client -d? Is it the original nbd-client/kernel that does it? Figuring that out would help narrow down the cause. > But for me this begs the question: why isn't the nbd_device's socket > always protected during sock_xmit() for both > transmits and receives; rather than just transmits (via tx_lock)!? It would deadlock if we held the lock over both. Generally we don't have to worry about receives, since they're always done in the nbd-client process, so we have control over when and how it exits and cleans up. The odd case, as you've discovered, is when another process (nbd-client -d) comes along and starts mucking with the queue and socket. Would "kill -9 <nbd-client-pid>" work for you instead? That is what I use to break the connection, and it's safe, as it tells the original nbd-client to exit (which it does cleanly and safely). -- Paul ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace _______________________________________________ Nbd-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nbd-general
