Hi All,
I was discussing my recent adventures with LVM2 with a friend who
suggested that I should post my recent experiences on the list
perchance someone could benefit from my experiences. So here goes:
I had a server that someone else had setup (to my disgust) in the
following way:
SUNFire X2100 with 2 250GB SATA disks. They created a small /boot
partition on sda1 for the boot stuff then created an LVM RAID 0 volume
from the rest of the space on sda2 and sdb1 which was a partition
usign the whole of the second disk, then then installed Redhat
Enterprise Server 4, created two logical volumes. One for root and
another for swap.
Now somewhere along the way, some "bright" chap realised that the
server had two disks but he was only seeing one in use :-) So the chap
formatted /dev/sdb1 to create it as a separate mount point. Since he
did not reboot, this mistake was never discovered.
So one fine day, the data on the server grows beyond the capacity of
the 1st disk and LVM2 attempts to extend on to the second disk in the
volume. Predictably this results in data corruption, major LVM errors
and the server crashes and fails to reboot because the LVM metadata is
corrupted.
Now the fun part: There was no backup of this server.
However the errors that LVM was throwing were actually showing cannot
find physical volume with uuid blah blah... so I actually had a uuid
to work with.
Variously on the web the suggestions were as follows:
pvcreate -ff --uuid xxxx /dev/sdb1
vgcreate VolGroup00 /dev/sda2 /dev/sdb1
vgcfgrestore -f metadata-file VolGroup00
However I did not have a backed up copy of the metadata file. Reading
a few things on the web showed me that LVM keeps its metadata files at
the beginning of the disk, in the first 255 sectors following the
partition table in sector 1 of the disk.
So I booted up with the rescue disk and used foremost to try extract
the text of the config files form the raw device using pattern
matching following the suggestions on http://blog.eliasprobst.eu/?p=3
however this did not work for me probably because as I discovered
later, there was no complete uncorrupted metadata file. However I
realised I was being too cute with my stuff and instead simply did:
dd if=/dev/sda2 of=/dd.txt count=255 skip=1 bs=512
then vi /dd.txt. I found a bunch of binary stuff and snippets of
config files but no complete config file showing that the metadata
files were indeed corrupt. With a bit of cut and paste, here and
there, I put together a working config file.
However I then run into my next hurdle. I can only assume that because
the disk was already full and the volume group configuration was still
corrupted, it meant that there was no space to put my configuration
file when I attempted to use vgcfgrestore as the logical disks were
not available to move the extra data onto the second disk.
So I again used dd again to manually overwrite what is on that section
of the raw HDD and put my created metadata file onto the disk.
However unless vgcfgrestore has actually failedand you have no
options, this is a very risky stunt and should be avoided. I only did
this stunt after doing a dd of the entire disk onto another just in
case I made a mistake.
Now when I did vgscan I was able to see
Found volume group "VolGroup00" using metadata type lvm2
I could also do pvscan and see both hard disks in the volume group.
I then run vgchange VolGroup00 -a y
then lvscan which showed me:
ACTIVE '/dev/VolGroup00/LogVol00' [476.38 GB] inherit
ACTIVE '/dev/VolGroup00/LogVol01' [512.00 MB] inherit
Voila, now that I could see my volumes, I could mount them and access
my data. However mounting them proved to be a problem initially as the
data was so corrupt that when i mounted them and attempted to list the
root directory using ls, it could not even tell whether the /usr /opt /
var were directories or files and put a ? in that field. That meant
that I needed to run fsck on the filesystem to see if I could get this
fixed.
Running fsck yielded errors and kept bombing out while trying to fix
the first couple of inodes alluding to a corrupted superblock. So I
then run mke2fs -n /dev/VolGroup00/LogVol00 which showed me the
alternate superblocks for the filesystem. I was then able to run fsck -
b superblock -y /dev/VolGroup00/LogVol00
Remember to run fsck with -y because initially I did not and had to
keep answering y to the fsck prompts however I got tired, cancelled
the fsck and redid it with -y which proved a good decision because the
fsck run for over 36 hours with those prompts flying by so fast ont he
screen that I expect they must have been about 100,000 of those
prompts at the least.
a couple of things to note:
1. if using LVM, be sure to have a backup of your /etc directory or at
least your /etc/lvm directory.
2. if you do not have a backup and end up getting a config file from
raw disk, pay attention to the } that close statements in the lvm
metadata file.
3. The usual. Take backups. I wouldn't have needed to go through all
of this if i had backups of my data. I would have simply reinstalled
and then restored from backup.
_______________________________________________
LUG mailing list
[email protected]
http://kym.net/mailman/listinfo/lug
%LUG is generously hosted by INFOCOM http://www.infocom.co.ug/
The above comments and data are owned by whoever posted them (including
attachments if any). The List's Host is not responsible for them in any way.
---------------------------------------