Duncan wrote:

I'd blame that on your choice of RAID (and ultimately on the defective hardware, but it wouldn't have been as bad on RAID-1 or RAID-6), more than on what was running on top of it.

Agree - RAID-6 would have helped in this particular circumstance (assuming I didn't lose more than one drive). The non-server hardware still was a big issue. I'm not sure I'd ever go with RAID-6 for personal use - that is a lot of money in non-useful drives.

What I'd guess happened is that the dirty/degraded crash happened while the set of stripes that also had the LVM2 record was being written, altho it wasn't necessarily the LVM data itself being written, but just something that happened to be in the same stripe set so the checksum covering it had to be rewritten as well. It's also possible the hardware error you mentioned was affecting the reliability of what the spindle returned even when it didn't cause resets. In that case, even if the data was on a different stripe, the resulting checksum written could end up invalid, thus playing havoc with a recovery.

Sounds likely. I think the lvm2 metadata got corrupted. I'm a big fan of zfs and btrfs (once they're production ready) precisely because they try to address the RAID stripe problem with copy-on-write right down to the physical level.

data=ordered is the middle ground and I believe what ext3 has always defaulted to, and what reiserfs has defaulted to for years.

Yup - using ordered data. From a metadata integrity standpoint I believe this has been shown to be equivalent to data=journal. As you point out once lvm was hosed that didn't help much.

Lucky, or more appropriately wise you! There aren't so many folks that backup to normally offline external device that regularly. Honestly, I don't.

Yeah - I've learned that lesson over time the hard way. I can't backup everything (at least not with a big investment), but I do use dar and par2 to backup everything important. I just create a dar backup weekly, and then run a script on a laptop to copy the data offline. I don't backup anything that requires snapshots (I use a cron job do do a mysql export separately and back that up), so that works fine for me. This is really just my high value data - when my system was hosed I had to reinstall from stage3, but I had all my /etc config files so getting up and running didn't take a huge amount of effort. However, I did learn the hard way that some programs store their actual config files in /var and symlink them into /etc - be sure to catch those in your backups! I ended up having my samba domain controller SID change which was a headache since now all my usernames don't have their old permissions on all my XP workstations). Granted, this is a house with all of four users, which helped with the cleanup.

So... I guess that's something else I can add to my list now, for the next time I setup a new disk set or whatever. To the everything-portage- touches-on-root that I explained in the other replies, and the RAID-6 that I had already chosen over RAID-5, I can now add to the list killing the LVM2 used in my current setup.


If you have RAID-6 I'm not sure it is worth worrying about getting rid of LVM2. At least, assuming you don't start having multiple-drive-failures (a possibility with desktop hardware with all the drives sharing the same power cords, interfaces, etc).

If you want to think really long term take a look at btrfs. It looks like it aims to be everything that zfs is (minus the GPL-incompatible license). Definitely not ready for prime time, but the proposed feature set looks better than zfs. I don't like the inability to reshape zfs - you can add more arrays to your system, but you can't add one drive to an existing array (online or offline). Btrfs seems to aim to be able to do this. Again, it is completely experimental at this point - don't use it except to try it out. It will be possible to migrate ext3/4 directly in-place to btrfs, and even reverse the migration (minus any changes - it essentially snapshots the existing data). The only limitation is that if you delete files you won't get the space back until you get rid of the ability to migrate back to ext3 (since it is a snapshot).

Reply via email to