Bug#549660: marked as done (md raid1 + lvm2 + snapshot resulted in lvm2 hang)

Debian Bug Tracking System Mon, 05 Oct 2009 07:50:07 -0700

Your message dated Mon, 5 Oct 2009 16:36:38 +0200
with message-id <[email protected]>
and subject line Re: Bug#549660: md raid1 + lvm2 + snapshot resulted in lvm2 
hang
has caused the Debian Bug report #549660,
regarding md raid1 + lvm2 + snapshot resulted in lvm2 hang
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
549660: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=549660
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: lvm2
Version: 2.02.39-7
Severity: important

Hello,

This problem is sort of similar to #419209 but I believe
it is different because in my case the snapshot 
was successfully created and the volume stalled 
later while writing to the snapshot.

I use a proxmox 1.3:
Linux ns300364.ovh.net 2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 
2009 x86_64 GNU/Linux

The system was installed a couple weeks ago and is not an upgrade.
Snapshot worked fine until the problem occured
despite no changes to the disks configuration.


My configuration:

md1: md RAID 1 + ext3 mounted as /
md0: md RAID 1 + lvm2  divided in 2 x ext3 volumes vmdata and vmbackups, 
mounted as /var/lib/vz and /backups.


r...@ns300364:/backups/tmp# lvdisplay
  --- Logical volume ---
  LV Name                /dev/data/vmdata
  VG Name                data
  LV UUID                9CzFBp-k7fV-wlls-qeeG-v7Or-u1pq-9XhKKy
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                309.57 GB
  Current LE             79250
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:1

  --- Logical volume ---
  LV Name                /dev/data/vmbackup
  VG Name                data
  LV UUID                jzCjXx-IodU-chBx-Aw3L-JUbv-dRho-vaOFCl
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                309.57 GB
  Current LE             79250
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           254:4

r...@ns300364:~# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Tue Sep 15 17:48:43 2009
     Raid Level : raid1
     Array Size : 664986496 (634.18 GiB 680.95 GB)
  Used Dev Size : 664986496 (634.18 GiB 680.95 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Oct  4 02:02:04 2009
          State : active, recovering
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

 Rebuild Status : 29% complete

           UUID : ab296276:ea3e622e:7008e345:84b8f442 (local to host 
ns300364.ov            h.net)
         Events : 0.17

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3

Symptoms:

- snapshot creation for vmdata OK
- backup on vmbackups started OK
- after writing about 1Gb the snapshot stalled. I mean that all requests
to read files on the lvm volume will hang. 
However "ls" and "cd" commands do work and I can get directories listing. 
Any command to read a file content stall the ssh session (ex cat,cp,mv). 
In particular, "cat /backups/phil.log" will also stall the ssh session. 
Remember that the snapshot is for volume vmdata, and the "cat" above concern 
volume vmbackups.
- smartctl do not report any problem (including the long test)
- "wa" in "top" is blocked at 99%, cpu is near zero.
- the snapshot is visible in /dev/mapper
- the snapshot cannot be removed (lvremove -f). again no error reported. just 
hanged with no output at all.
- the system seem to work fine as long as nothing tries to read on one of the 
2 lvm2 volumes.
- no error reported in messages or syslog. 
- it seem a md check started after the snapshot creation. This check process 
also stalled at 29% (speed=0K/sec). again no error reported.
- soft reboot did not work
- hard reboot worked. But a md resync started and stalled at 0.1% leaving the 
system in the same context as before the hard reboot.

To recover a working system I set sdb3 as faulty, removed sdb3 from the raid1 
and hard rebooted. That worked and I could remove the snapshot and access the 
data 
on both lvm volumes. Since then, I did not try to create a snapshot and system 
seem to work fine.

Greetings,

Phil Ten

--- End Message ---

--- Begin Message ---

On Mon, Oct 05, 2009 at 02:30:01PM +0400, Phil Ten wrote:
> This problem is sort of similar to #419209 but I believe
> it is different because in my case the snapshot 
> was successfully created and the volume stalled 
> later while writing to the snapshot.

lvm2 only instructs the device-mapper subsystem of the kernel to make a
snapshot. So it is a kernel problem if it breaks later.

> I use a proxmox 1.3:
> Linux ns300364.ovh.net 2.6.24-7-pve #1 SMP PREEMPT Fri Aug 21 09:07:39 CEST 
> 2009 x86_64 GNU/Linux

This is no Debian kernel, so you have to ask the maintainer of this for
help.

> - snapshot creation for vmdata OK

How did you create the snapshot?

> - after writing about 1Gb the snapshot stalled. I mean that all requests
> to read files on the lvm volume will hang. 

This looks like a full snapshot, but this should be mentioned in the
log.

Closing as lvm2 can't be blamed and the other components are non-Debian.

Bastian

-- 
Killing is wrong.
                -- Losira, "That Which Survives", stardate unknown

--- End Message ---

Bug#549660: marked as done (md raid1 + lvm2 + snapshot resulted in lvm2 hang)

Reply via email to