Hi All,

I was discussing my recent adventures with LVM2 with a friend who suggested that I should post my recent experiences on the list perchance someone could benefit from my experiences. So here goes:

I had a server that someone else had setup (to my disgust) in the following way:

SUNFire X2100 with 2 250GB SATA disks. They created a small /boot partition on sda1 for the boot stuff then created an LVM RAID 0 volume from the rest of the space on sda2 and sdb1 which was a partition usign the whole of the second disk, then then installed Redhat Enterprise Server 4, created two logical volumes. One for root and another for swap.

Now somewhere along the way, some "bright" chap realised that the server had two disks but he was only seeing one in use :-) So the chap formatted /dev/sdb1 to create it as a separate mount point. Since he did not reboot, this mistake was never discovered.

So one fine day, the data on the server grows beyond the capacity of the 1st disk and LVM2 attempts to extend on to the second disk in the volume. Predictably this results in data corruption, major LVM errors and the server crashes and fails to reboot because the LVM metadata is corrupted.

Now the fun part: There was no backup of this server.

However the errors that LVM was throwing were actually showing cannot find physical volume with uuid blah blah... so I actually had a uuid to work with.
Variously on the web the suggestions were as follows:

pvcreate -ff --uuid xxxx /dev/sdb1
vgcreate VolGroup00 /dev/sda2 /dev/sdb1
vgcfgrestore -f metadata-file VolGroup00

However I did not have a backed up copy of the metadata file. Reading a few things on the web showed me that LVM keeps its metadata files at the beginning of the disk, in the first 255 sectors following the partition table in sector 1 of the disk. So I booted up with the rescue disk and used foremost to try extract the text of the config files form the raw device using pattern matching following the suggestions on http://blog.eliasprobst.eu/?p=3 however this did not work for me probably because as I discovered later, there was no complete uncorrupted metadata file. However I realised I was being too cute with my stuff and instead simply did:

dd if=/dev/sda2 of=/dd.txt count=255 skip=1 bs=512

then vi /dd.txt. I found a bunch of binary stuff and snippets of config files but no complete config file showing that the metadata files were indeed corrupt. With a bit of cut and paste, here and there, I put together a working config file.

However I then run into my next hurdle. I can only assume that because the disk was already full and the volume group configuration was still corrupted, it meant that there was no space to put my configuration file when I attempted to use vgcfgrestore as the logical disks were not available to move the extra data onto the second disk.

So I again used dd again to manually overwrite what is on that section of the raw HDD and put my created metadata file onto the disk. However unless vgcfgrestore has actually failedand you have no options, this is a very risky stunt and should be avoided. I only did this stunt after doing a dd of the entire disk onto another just in case I made a mistake.

Now when I did vgscan I was able to see
Found volume group "VolGroup00" using metadata type lvm2
I could also do pvscan and see both hard disks in the volume group.
I then run vgchange VolGroup00 -a y
then lvscan which showed me:
ACTIVE '/dev/VolGroup00/LogVol00' [476.38 GB] inherit
 ACTIVE '/dev/VolGroup00/LogVol01' [512.00 MB] inherit

Voila, now that I could see my volumes, I could mount them and access my data. However mounting them proved to be a problem initially as the data was so corrupt that when i mounted them and attempted to list the root directory using ls, it could not even tell whether the /usr /opt / var were directories or files and put a ? in that field. That meant that I needed to run fsck on the filesystem to see if I could get this fixed.

Running fsck yielded errors and kept bombing out while trying to fix the first couple of inodes alluding to a corrupted superblock. So I then run mke2fs -n /dev/VolGroup00/LogVol00 which showed me the alternate superblocks for the filesystem. I was then able to run fsck - b superblock -y /dev/VolGroup00/LogVol00 Remember to run fsck with -y because initially I did not and had to keep answering y to the fsck prompts however I got tired, cancelled the fsck and redid it with -y which proved a good decision because the fsck run for over 36 hours with those prompts flying by so fast ont he screen that I expect they must have been about 100,000 of those prompts at the least.

a couple of things to note:
1. if using LVM, be sure to have a backup of your /etc directory or at least your /etc/lvm directory. 2. if you do not have a backup and end up getting a config file from raw disk, pay attention to the } that close statements in the lvm metadata file. 3. The usual. Take backups. I wouldn't have needed to go through all of this if i had backups of my data. I would have simply reinstalled and then restored from backup.

_______________________________________________
LUG mailing list
[email protected]
http://kym.net/mailman/listinfo/lug
%LUG is generously hosted by INFOCOM http://www.infocom.co.ug/

The above comments and data are owned by whoever posted them (including 
attachments if any). The List's Host is not responsible for them in any way.
---------------------------------------

Reply via email to