Package: src:linux Version: 3.16.7-2 Severity: normal Tags: patch Dear Maintainer,
the nbd timeout settings from nbd-client to the kernel is broken inside the jessie kernel. This renders raid1 on top of nbd devices useless, as that device will simply hang when the network connection or the nbd-server fails until the connection to the nbd-server is brought back to live - clearly not what is intended with using a raid1. This worked in lenny and was broken since wheezy. Appended is the patch obtained from the nbd subsystem maintainers list, which allows me to rebuild the jessie kernel package and made it work again. If you need futher information or references please let me know. Many thanks, greetings Hermann -- Package-specific info: ** Version: Linux version 3.16.0-4-amd64 ([email protected]) (gcc version 4.8.3 (Debian 4.8.3-13) ) #1 SMP Debian 3.16.7-2 (2014-11-06) ** Command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-4-amd64 root=UUID=7695ce6d-8761-4755-b460-8e0bcd26f176 ro quiet ** Not tainted ** Kernel log: [ 0.533678] Freeing unused kernel memory: 940K (ffff880001515000 - ffff880001600000) [ 0.534261] Freeing unused kernel memory: 228K (ffff8800017c7000 - ffff880001800000) [ 0.588448] systemd-udevd[52]: starting version 215 [ 0.589295] random: systemd-udevd urandom read with 2 bits of entropy available [ 0.636304] SCSI subsystem initialized [ 0.645829] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004) [ 0.646259] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 10 [ 0.652879] ACPI: bus type USB registered [ 0.652904] usbcore: registered new interface driver usbfs [ 0.652925] usbcore: registered new interface driver hub [ 0.654698] 8139cp 0000:00:03.0 eth0: RTL-8139C+ at 0xffffc90000002000, 54:52:00:af:12:a2, IRQ 10 [ 0.656747] FDC 0 is a S82078B [ 0.658026] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 0.658696] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 11 [ 0.660542] usbcore: registered new device driver usb [ 0.661896] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 0.663074] uhci_hcd: USB Universal Host Controller Interface driver [ 0.663357] uhci_hcd 0000:00:01.2: UHCI Host Controller [ 0.663365] uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1 [ 0.663401] uhci_hcd 0000:00:01.2: detected 2 ports [ 0.663507] uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c140 [ 0.665457] 8139too: 8139too Fast Ethernet driver 0.9.28 [ 0.666046] libata version 3.00 loaded. [ 0.673509] usb usb1: New USB device found, idVendor=1d6b, idProduct=0001 [ 0.673513] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 0.673515] usb usb1: Product: UHCI Host Controller [ 0.673517] usb usb1: Manufacturer: Linux 3.16.0-4-amd64 uhci_hcd [ 0.673518] usb usb1: SerialNumber: 0000:00:01.2 [ 0.674124] hub 1-0:1.0: USB hub found [ 0.674132] hub 1-0:1.0: 2 ports detected [ 0.675157] ata_piix 0000:00:01.1: version 2.13 [ 0.684687] scsi0 : ata_piix [ 0.693987] scsi1 : ata_piix [ 0.694036] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc1a0 irq 14 [ 0.694039] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc1a8 irq 15 [ 0.698493] virtio-pci 0000:00:04.0: irq 40 for MSI/MSI-X [ 0.698522] virtio-pci 0000:00:04.0: irq 41 for MSI/MSI-X [ 0.698550] virtio-pci 0000:00:04.0: irq 42 for MSI/MSI-X [ 0.703134] vda: vda1 vda2 < vda5 > [ 0.885202] nbd: registered device at major 43 [ 0.926591] PM: Starting manual resume from disk [ 0.926596] PM: Hibernation image partition 254:5 present [ 0.926598] PM: Looking for hibernation image. [ 0.926745] PM: Image not found (code -22) [ 0.926748] PM: Hibernation image not present or could not be loaded. [ 0.959121] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null) [ 1.161669] systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory. [ 1.344269] systemd-udevd[137]: starting version 215 [ 1.437657] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2 [ 1.437663] ACPI: Power Button [PWRF] [ 1.493314] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 1.496327] tsc: Refined TSC clocksource calibration: 2659.670 MHz [ 1.531351] [drm] Initialized drm 1.1.0 20060810 [ 1.549336] input: PC Speaker as /devices/platform/pcspkr/input/input4 [ 1.610390] Adding 138236k swap on /dev/vda5. Priority:-1 extents:1 across:138236k FS [ 1.685921] ppdev: user-space parallel port driver [ 1.838506] EXT4-fs (vda1): re-mounted. Opts: errors=remount-ro [ 2.259620] 8139cp 0000:00:03.0 eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [ 2.271080] systemd-journald[128]: Received request to flush runtime journal from PID 1 [ 2.370535] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3 [ 2.402626] RPC: Registered named UNIX socket transport module. [ 2.402630] RPC: Registered udp transport module. [ 2.402631] RPC: Registered tcp transport module. [ 2.402632] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 2.413994] FS-Cache: Loaded [ 2.424204] FS-Cache: Netfs 'nfs' registered for caching [ 2.441518] Installing knfsd (copyright (C) 1996 [email protected]). [ 123.184988] block nbd0: NBD_SET_TIMEOUT: 0 -> 5 [ 124.198368] random: nonblocking pool is initialized [ 124.201614] nbd0: unknown partition table [ 554.016041] nbd: killing hung xmit (nbd-client, pid: 745) [ 554.016928] nbd (pid 745: nbd-client) got signal 9 [ 554.016934] block nbd0: shutting down socket [ 554.017044] block nbd0: Receive control failed (result -4) [ 554.017480] end_request: I/O error, dev nbd0, sector 29514736 [ 554.017861] Buffer I/O error on device nbd0, logical block 14757368 [ 554.018259] Buffer I/O error on device nbd0, logical block 14757369 [ 554.018656] Buffer I/O error on device nbd0, logical block 14757370 [ 554.019053] Buffer I/O error on device nbd0, logical block 14757371 [ 554.019452] Buffer I/O error on device nbd0, logical block 14757372 [ 554.019849] Buffer I/O error on device nbd0, logical block 14757373 [ 554.020028] Buffer I/O error on device nbd0, logical block 14757374 [ 554.020028] Buffer I/O error on device nbd0, logical block 14757375 [ 554.020028] Buffer I/O error on device nbd0, logical block 14757376 [ 554.020028] Buffer I/O error on device nbd0, logical block 14757377 [ 554.021951] end_request: I/O error, dev nbd0, sector 29514992 [ 554.022368] block nbd0: queue cleared [ 554.023825] block nbd0: Attempted send on closed socket [ 554.024221] end_request: I/O error, dev nbd0, sector 29514736 [ 554.024921] block nbd0: Attempted send on closed socket [ 554.025297] end_request: I/O error, dev nbd0, sector 29514738 [ 554.025678] block nbd0: Attempted send on closed socket [ 554.026034] end_request: I/O error, dev nbd0, sector 29514740 [ 554.026413] block nbd0: Attempted send on closed socket [ 554.026768] end_request: I/O error, dev nbd0, sector 29514742 [ 786.935693] block nbd0: NBD_DISCONNECT [ 786.948625] nbd: unregistered device at major 43 [ 801.888818] nbd: registered device at major 43 [ 801.952479] block nbd0: NBD_SET_TIMEOUT: 0 -> 5 [ 801.968453] nbd0: unknown partition table ** Model information sys_vendor: Bochs product_name: Bochs product_version: chassis_vendor: Bochs chassis_version: bios_vendor: Bochs bios_version: Bochs ** Loaded modules: nbd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc md_mod ppdev ttm pcspkr drm_kms_helper evdev psmouse serio_raw virtio_balloon i2c_piix4 drm i2c_core parport_pc parport processor button thermal_sys autofs4 ext4 crc16 mbcache jbd2 ata_generic virtio_blk virtio_net ata_piix uhci_hcd ehci_hcd 8139too virtio_pci virtio_ring virtio 8139cp mii floppy libata usbcore usb_common scsi_mod ** PCI devices: 00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000] Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Physical Slot: 1 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010] (prog-if 80 [Master]) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Physical Slot: 1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) Region 4: I/O ports at c1a0 [size=16] Kernel driver in use: ata_piix 00:01.2 USB controller [0c03]: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] [8086:7020] (rev 01) (prog-if 00 [UHCI]) Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100] Physical Slot: 1 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin D routed to IRQ 11 Region 4: I/O ports at c140 [size=32] Kernel driver in use: uhci_hcd 00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03) Subsystem: Red Hat, Inc Qemu virtual machine [1af4:1100] Physical Slot: 1 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 9 Kernel driver in use: piix4_smbus 00:02.0 VGA compatible controller [0300]: Cirrus Logic GD 5446 [1013:00b8] (prog-if 00 [VGA controller]) Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100] Physical Slot: 2 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M] Region 1: Memory at febfd000 (32-bit, non-prefetchable) [size=4K] Expansion ROM at <unassigned> [disabled] 00:03.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8100/8101L/8139 PCI Fast Ethernet Adapter [10ec:8139] (rev 20) Subsystem: Red Hat, Inc QEMU Virtual Machine [1af4:1100] Physical Slot: 3 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c000 [size=256] Region 1: Memory at febfe000 (32-bit, non-prefetchable) [size=256] Kernel driver in use: 8139cp 00:04.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] Subsystem: Red Hat, Inc Device [1af4:0001] Physical Slot: 4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at c160 [size=32] Region 1: Memory at febff000 (32-bit, non-prefetchable) [size=4K] Capabilities: <access denied> Kernel driver in use: virtio-pci 00:05.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001] Subsystem: Red Hat, Inc Device [1af4:0002] Physical Slot: 5 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: I/O ports at c100 [size=64] Kernel driver in use: virtio-pci 00:06.0 RAM memory [0500]: Red Hat, Inc Virtio memory balloon [1af4:1002] Subsystem: Red Hat, Inc Device [1af4:0005] Physical Slot: 6 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at c180 [size=32] Kernel driver in use: virtio-pci ** USB devices: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub -- System Information: Debian Release: jessie/sid APT prefers testing-updates APT policy: (500, 'testing-updates'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core) Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages linux-image-3.16.0-4-amd64 depends on: ii debconf [debconf-2.0] 1.5.53 ii initramfs-tools [linux-initramfs-tool] 0.116 ii kmod 18-3 ii linux-base 3.5 Versions of packages linux-image-3.16.0-4-amd64 recommends: ii firmware-linux-free 3.3 ii irqbalance 1.0.6-3 Versions of packages linux-image-3.16.0-4-amd64 suggests: pn debian-kernel-handbook <none> ii grub-pc 2.02~beta2-15 pn linux-doc-3.16 <none> Versions of packages linux-image-3.16.0-4-amd64 is related to: pn firmware-atheros <none> pn firmware-bnx2 <none> pn firmware-bnx2x <none> pn firmware-brcm80211 <none> pn firmware-intelwimax <none> pn firmware-ipw2x00 <none> pn firmware-ivtv <none> pn firmware-iwlwifi <none> pn firmware-libertas <none> pn firmware-linux <none> pn firmware-linux-nonfree <none> pn firmware-myricom <none> pn firmware-netxen <none> pn firmware-qlogic <none> pn firmware-ralink <none> pn firmware-realtek <none> pn xen-hypervisor <none> -- debconf information: linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64: linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true
commit 6b5f5a68e8da4bc8d948f25b21dcd6eeeb16ae7d Author: Michal Belczyk <[email protected]> Date: Tue Nov 18 10:50:19 2014 +0100 nbd: improve request timeouts handling The main idea behind it is to be able to quickly detect broken replica and switch over to another when used with any sort of mirror type device built on top of any number of nbd devices. Before this change a request would time out causing the socket to be shut down and the device to fail in case of a dead server or removed network cable only if: a) either the timer around kernel_sendmsg() kicked in b) or the TCP failures on retransmission finally caused an error on the socket, likely blocked on kernel_recvmsg() at this time, waiting for replies from the server Case a) depends mostly on the size of requests issued and on the maximum size of the socket buffer -- a lot of read request headers or small write requests could be "sent" without triggering the requested timeout Case b) timeout is independent of nbd-client -t <timeout> option as there is no TCP_USER_TIMEOUT set on the client socket by default. And even if such timeout was set it would not solve the problem of an nbd-client hung on receiving replies for much longer time without setting TCP keep-alives... and that would be the third, independent timeout setting required to make it work "almost" as expected... So, instead, take the big hammer approach and: *) trace the number of outstanding requests sent to the server (nbd->inflight) *) enable the timer (nbd->req_timer) before the first request is submitted and leave it enabled *) when sending next request do not touch the timer (it is up to the receiving side to control it at this point) *) on receive side update the timer every time a response is collected but there are more to read from the server *) disable the timer whenever the inflight counter drops to zero or an error (leading to the socket shutdown) is returned This patch does NOT prevent the server to process a request for longer than the timeout specified if only it replies to any other request submitted within the timeout (the server still may reply to a batch of requests in any order). Only the nbd->xmit_timeout != 0 code path is changed so the patch should not affect nbd connections running without an explicit timeout set on the nbd-client command line. There is also no way to enable or disable the timeout on an active (nbd->pid != 0) nbd device, it is however possible to change its value. Otherwise the inflight request counter would have to affect the nbd devices enabled without nbd-client -t <timeout>. Also move nbd->pid modifications behind nbd->tx_lock wherever possible to avoid races between the concurrent nbd-client invocations. diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 4bc2a5c..cc4a98a 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -140,11 +140,24 @@ static void sock_shutdown(struct nbd_device *nbd, int lock) static void nbd_xmit_timeout(unsigned long arg) { - struct task_struct *task = (struct task_struct *)arg; + struct nbd_device *nbd = (struct nbd_device *)arg; + struct task_struct *task_ary[2]; + unsigned long flags; + int i; - printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n", - task->comm, task->pid); - force_sig(SIGKILL, task); + spin_lock_irqsave(&nbd->timer_lock, flags); + nbd->timedout = 1; + task_ary[0] = nbd->sender; + task_ary[1] = nbd->receiver; + for (i = 0; i < 2; i++) { + if (task_ary[i] == NULL) + continue; + printk(KERN_WARNING "nbd: killing hung xmit (%s, pid: %d)\n", + task_ary[i]->comm, task_ary[i]->pid); + force_sig(SIGKILL, task_ary[i]); + break; + } + spin_unlock_irqrestore(&nbd->timer_lock, flags); } /* @@ -158,7 +171,7 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, struct msghdr msg; struct kvec iov; sigset_t blocked, oldset; - unsigned long pflags = current->flags; + unsigned long flags, pflags = current->flags; if (unlikely(!sock)) { dev_err(disk_to_dev(nbd->disk), @@ -183,23 +196,39 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL; - if (send) { - struct timer_list ti; - - if (nbd->xmit_timeout) { - init_timer(&ti); - ti.function = nbd_xmit_timeout; - ti.data = (unsigned long)current; - ti.expires = jiffies + nbd->xmit_timeout; - add_timer(&ti); + if (nbd->xmit_timeout) { + spin_lock_irqsave(&nbd->timer_lock, flags); + if (nbd->timedout) { + spin_unlock_irqrestore(&nbd->timer_lock, flags); + printk(KERN_WARNING + "nbd (pid %d: %s) timed out\n", + task_pid_nr(current), current->comm); + result = -EINTR; + sock_shutdown(nbd, !send); + break; } + if (send) + nbd->sender = current; + else + nbd->receiver = current; + spin_unlock_irqrestore(&nbd->timer_lock, flags); + } + + if (send) result = kernel_sendmsg(sock, &msg, &iov, 1, size); - if (nbd->xmit_timeout) - del_timer_sync(&ti); - } else + else result = kernel_recvmsg(sock, &msg, &iov, 1, size, msg.msg_flags); + if (nbd->xmit_timeout) { + spin_lock_irqsave(&nbd->timer_lock, flags); + if (send) + nbd->sender = NULL; + else + nbd->receiver = NULL; + spin_unlock_irqrestore(&nbd->timer_lock, flags); + } + if (signal_pending(current)) { siginfo_t info; printk(KERN_WARNING "nbd (pid %d: %s) got signal %d\n", @@ -226,12 +255,12 @@ static int sock_xmit(struct nbd_device *nbd, int send, void *buf, int size, } static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec, - int flags) + int msg_flags) { int result; void *kaddr = kmap(bvec->bv_page); result = sock_xmit(nbd, 1, kaddr + bvec->bv_offset, - bvec->bv_len, flags); + bvec->bv_len, msg_flags); kunmap(bvec->bv_page); return result; } @@ -239,9 +268,9 @@ static inline int sock_send_bvec(struct nbd_device *nbd, struct bio_vec *bvec, /* always call with the tx_lock held */ static int nbd_send_req(struct nbd_device *nbd, struct request *req) { - int result, flags; + int result, msg_flags; struct nbd_request request; - unsigned long size = blk_rq_bytes(req); + unsigned long flags, size = blk_rq_bytes(req); memset(&request, 0, sizeof(request)); request.magic = htonl(NBD_REQUEST_MAGIC); @@ -253,6 +282,19 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) } memcpy(request.handle, &req, sizeof(req)); + if (nbd->xmit_timeout) { + spin_lock_irqsave(&nbd->timer_lock, flags); + if (!nbd->inflight) { + nbd->req_timer.function = nbd_xmit_timeout; + nbd->req_timer.data = (unsigned long)nbd; + nbd->req_timer.expires = jiffies + nbd->xmit_timeout; + add_timer(&nbd->req_timer); + } + nbd->inflight++; + BUG_ON(nbd->inflight <= 0); + spin_unlock_irqrestore(&nbd->timer_lock, flags); + } + dprintk(DBG_TX, "%s: request %p: sending control (%s@%llu,%uB)\n", nbd->disk->disk_name, req, nbdcmd_to_ascii(nbd_cmd(req)), @@ -274,12 +316,12 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) * whether to set MSG_MORE or not... */ rq_for_each_segment(bvec, req, iter) { - flags = 0; + msg_flags = 0; if (!rq_iter_last(bvec, iter)) - flags = MSG_MORE; + msg_flags = MSG_MORE; dprintk(DBG_TX, "%s: request %p: sending %d bytes data\n", nbd->disk->disk_name, req, bvec.bv_len); - result = sock_send_bvec(nbd, &bvec, flags); + result = sock_send_bvec(nbd, &bvec, msg_flags); if (result <= 0) { dev_err(disk_to_dev(nbd->disk), "Send data failed (result %d)\n", @@ -291,6 +333,14 @@ static int nbd_send_req(struct nbd_device *nbd, struct request *req) return 0; error_out: + if (nbd->xmit_timeout) { + spin_lock_irqsave(&nbd->timer_lock, flags); + nbd->inflight--; + BUG_ON(nbd->inflight < 0); + if (!nbd->inflight) + del_timer_sync(&nbd->req_timer); + spin_unlock_irqrestore(&nbd->timer_lock, flags); + } return -EIO; } @@ -412,24 +462,41 @@ static struct device_attribute pid_attr = { static int nbd_do_it(struct nbd_device *nbd) { struct request *req; + unsigned long flags; int ret; BUG_ON(nbd->magic != NBD_MAGIC); sk_set_memalloc(nbd->sock->sk); - nbd->pid = task_pid_nr(current); ret = device_create_file(disk_to_dev(nbd->disk), &pid_attr); if (ret) { dev_err(disk_to_dev(nbd->disk), "device_create_file failed!\n"); - nbd->pid = 0; return ret; } - while ((req = nbd_read_stat(nbd)) != NULL) - nbd_end_request(req); + for (;;) { + req = nbd_read_stat(nbd); + if (nbd->xmit_timeout) { + spin_lock_irqsave(&nbd->timer_lock, flags); + if (req != NULL) { + nbd->inflight--; + BUG_ON(nbd->inflight < 0); + } + if (req != NULL && nbd->inflight) + mod_timer(&nbd->req_timer, + jiffies + nbd->xmit_timeout); + else + del_timer_sync(&nbd->req_timer); + spin_unlock_irqrestore(&nbd->timer_lock, flags); + } + if (req != NULL) { + nbd_end_request(req); + continue; + } + break; + } device_remove_file(disk_to_dev(nbd->disk), &pid_attr); - nbd->pid = 0; return 0; } @@ -669,9 +736,20 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, set_capacity(nbd->disk, nbd->bytesize >> 9); return 0; - case NBD_SET_TIMEOUT: - nbd->xmit_timeout = arg * HZ; + case NBD_SET_TIMEOUT: { + int xt; + + xt = arg * HZ; + if (xt < 0) + return -EINVAL; + if (nbd->pid && + ((!nbd->xmit_timeout && xt) || (nbd->xmit_timeout && !xt))) + return -EBUSY; + dev_info(disk_to_dev(nbd->disk), "NBD_SET_TIMEOUT: %d -> %d\n", + nbd->xmit_timeout / HZ, xt / HZ); + nbd->xmit_timeout = xt; return 0; + } case NBD_SET_FLAGS: nbd->flags = arg; @@ -694,6 +772,11 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, if (!nbd->sock) return -EINVAL; + nbd->pid = task_pid_nr(current); + nbd->inflight = 0; + nbd->timedout = 0; + nbd->sender = NULL; + nbd->receiver = NULL; mutex_unlock(&nbd->tx_lock); if (nbd->flags & NBD_FLAG_READ_ONLY) @@ -710,6 +793,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, nbd->disk->disk_name); if (IS_ERR(thread)) { mutex_lock(&nbd->tx_lock); + nbd->pid = 0; return PTR_ERR(thread); } wake_up_process(thread); @@ -717,6 +801,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, kthread_stop(thread); mutex_lock(&nbd->tx_lock); + nbd->pid = 0; if (error) return error; sock_shutdown(nbd, 0); @@ -731,6 +816,7 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd, sockfd_put(sock); nbd->flags = 0; nbd->bytesize = 0; + nbd->xmit_timeout = 0; bdev->bd_inode->i_size = 0; set_capacity(nbd->disk, 0); if (max_part > 0) @@ -874,6 +960,8 @@ static int __init nbd_init(void) init_waitqueue_head(&nbd_dev[i].waiting_wq); nbd_dev[i].blksize = 1024; nbd_dev[i].bytesize = 0; + spin_lock_init(&nbd_dev[i].timer_lock); + init_timer(&nbd_dev[i].req_timer); disk->major = NBD_MAJOR; disk->first_minor = i << part_shift; disk->fops = &nbd_fops; diff --git a/include/linux/nbd.h b/include/linux/nbd.h index f62f78a..c1280ca 100644 --- a/include/linux/nbd.h +++ b/include/linux/nbd.h @@ -41,6 +41,12 @@ struct nbd_device { pid_t pid; /* pid of nbd-client, if attached */ int xmit_timeout; int disconnect; /* a disconnect has been requested by user */ + spinlock_t timer_lock; + struct timer_list req_timer; + int inflight; + int timedout; + struct task_struct *sender; + struct task_struct *receiver; }; #endif

