Your message dated Wed, 7 Oct 2009 23:14:14 +0200
with message-id <[email protected]>
and subject line Re: Bug#549904: nbd-client: md raid1 hangs over nbd device if
nbd-server is dead
has caused the Debian Bug report #549904,
regarding nbd-client: md raid1 hangs over nbd device if nbd-server is dead
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
549904: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549904
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: nbd-client
Version: 1:2.9.11-3
Severity: important
with the raid1 setup below if the nbd-server stops nbd-client dies too, the
/dev/nbd0 device
disappears and a "cat /proc/mdstat" simple hangs. The raid layer seems to get
no error from
the nbd0 device to disable the device. Adding the -persits option or a
timeout=5 to the
nbd-client doesn't help.
Any ideas or requests for more information are welcome,
greetings
Hermann
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
[raid10]
md1 : active raid1 nbd0[1](W) cciss/c0d1[0]
292935872 blocks [2/2] [UU]
#commandline:
nbd-client timeout=5 server.iwr.uni-heidelberg.de 12399 /dev/nbd0
#dmesg (nothing else):
[353466.516947] nbd0: unknown partition table
[353545.607459] nbd0: Receive control failed (result -32)
[353545.666352] nbd0: shutting down socket
[353545.712550] nbd0: queue cleared
-- System Information:
Debian Release: 5.0.3
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.26-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash
Versions of packages nbd-client depends on:
ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii libc6 2.7-18 GNU C Library: Shared libraries
nbd-client recommends no packages.
nbd-client suggests no packages.
-- debconf information:
* nbd-client/killall: false
nbd-client/device:
nbd-client/host:
nbd-client/port:
nbd-client/type: raw
nbd-client/number: 0
nbd-client/no-auto-config:
--- End Message ---
--- Begin Message ---
On Wed, Oct 07, 2009 at 07:54:59PM +0200, Wouter Verhelst wrote:
> On Wed, Oct 07, 2009 at 11:07:42AM +0200, Hermann Lauer wrote:
> > On Wed, Oct 07, 2009 at 12:03:01AM +0200, Wouter Verhelst wrote:
> > > > with the raid1 setup below if the nbd-server stops nbd-client dies too,
> > > > the /dev/nbd0 device
> > > > disappears
> > >
> > > What do you mean by 'the device disappears'? The device node will not go
> > > away; do you just mean to say that the connection is lost?
> >
> > Sorry, I mean disappearing from /proc/partitions.
>
> Okay.
>
> > > What happens if you wait "timeout" seconds and try to read from the
> > > device manually (e.g., by doing "dd if=/dev/nbd0 of=/dev/zero count=1")?
> >
> > The nbd-client dies, and the partition is no longer in /proc/partitions.
> >
> > # dd if=/dev/nbd0 of=/dev/zero count=1
> > 0+0 records in
> > 0+0 records out
> > 0 bytes (0 B) copied, 3.9789e-05 s, 0.0 kB/s
> >
> > dd on a never used /dev/nbd* device reports the same.
>
> Hm.
>
> Looks like the kernel no longer produces an error when trying to read
> from a not-connected NBD device.
As I suspected, this is a kernel bug. The good news is that Paul
Clements (who maintains the kernel side of NBD) already fixed it back in
February; the bad news is that your kernel is too old to contain the
fix.
You should be able to get things to work properly again by either
upgrading to 2.6.29 or above, or compiling a kernel with the attached
patch applied.
Regards,
--
The biometric identification system at the gates of the CIA headquarters
works because there's a guard with a large gun making sure no one is
trying to fool the system.
http://www.schneier.com/blog/archives/2009/01/biometrics.html
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 34f80fa..8299e2d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -549,6 +549,15 @@ static void do_nbd_request(struct request_queue * q)
BUG_ON(lo->magic != LO_MAGIC);
+ if (unlikely(!lo->sock)) {
+ printk(KERN_ERR "%s: Attempted send on closed socket\n",
+ lo->disk->disk_name);
+ req->errors++;
+ nbd_end_request(req);
+ spin_lock_irq(q->queue_lock);
+ continue;
+ }
+
spin_lock_irq(&lo->queue_lock);
list_add_tail(&req->queuelist, &lo->waiting_queue);
spin_unlock_irq(&lo->queue_lock);
signature.asc
Description: Digital signature
--- End Message ---