Greg Oster wrote: ... > Here's what I'd encourage you (or anyone else) to do:
actually, I'd encourage you do try your own test. Results were interesting. > 1) Create a ccd as you describe in the HOWTO and mount the filesystem. used my own instructions, if you don't mind. :) Softdeps on. That may matter. Or it may not. Not sure. > 2) Start extracting 5 copies of src.tar.gz onto the filesystem ( > simultanously is preferred, but basically anything that will generate > a lot of IO here is what is needed). I wussed out here. Did one unpacking of a Maildir in a .tgz file. But lots of IO, lots of thrashing, disks were basically saturated with work, processor was waiting for disk. Lots of tiny files. On the other hand, that's a lot more activity than this machine will ever see in production. My first (and second) test was copying the 86M .tgz file, but that was horribly uninteresting. Resetting the machine well into the copy resulted in a zero-byte file after fsck. Truncated. Not a big surprise, really. > 3) After that's been going for a while, and while still in progress, > pull the power from the machine. Drop power mid write, you are risking your disk. Yes, I have spiked disks with a nail gun to test RAID in the past, but didn't feel like possibly toasting two disks by powering down the machine mid-write at this time. This system has purpose for me. :) So, I hit the reset button on the machine. That should give something similar to (though admittedly, not identical to) a crash. No, hitting the reset is NOT the same as a power outage. It isn't the same as a crash either -- in the later case, I'm going to say that it is just different, not easier or harder...so my test is only one kind of failure (and I REALLY didn't feel like pulling a memory module out to simulate a HW failure... :) > 4) Fire the machine back up, configure the ccd again, and run fsck a > few times to make sure the ccd filesystem is "clean". once did the job. Second fsck came up clean. Don't expect different results on the third or fourth... > 5) Now unconfigure the ccd. mounted each separately as a non-mirrored ccd file system. > 6) Do an md5 checksum of each of the parts of the mirror, and see if > they differ. (they shouldn't, but I bet the do!!) I think the md5 test of the mirror elements is bogus here. I don't care if an unallocated block is different. I care if the files are different. I might not even care about that much. See below... > If they differ, tell me how ccd detected that difference, and how it > warned you that if the primary drive died that you'd have incorrect > data. If they don't differ, go buy a lottery ticket, cause it's > your lucky day! ;) I used diff(1) to compare the two trees created by splitting the mirror. No difference found. i.e., ccd(4) mirroring passed a somewhat simplified version of your test. I even modified one of the files to make sure I didn't blow the diff command usage... 188M of files in the tree, no differences. I will admit I was pleasantly surprised, though not totally shocked that it did. My first clue was what happened when I tried to interrupt the copy of a single very large file to the ccd(4) file system. Even though many megabytes had been transfered, by the time fsck got finished, the file had been truncated to zero bytes (this test was repeated twice, same results each time). Zero byte files tend to match pretty well. :) I haven't looked closely at the code, but I rather suspect that the ccd(4) code sends the same data out to both disks at very close to the same time, without wandering off to do other things in between. In order for things to get out of sync, the "event" would have to happen between the time data was sent to the first disk and before it got sent to the second. I'm not sure, but I suspect there are relatively few times you will get a software crash that would cause that (yes, your disk IO code could crash, but I suspect if that was prone to happening, you have much bigger problems on your hands!). However, that doesn't cover power outages, HW failure, or careless hitting of the reset button. But let's think about this a moment... The file system IS wrong. I was untaring a big .tgz file, and what is on the file system does not match what was in the .tgz file, as it hadn't finished! If that was a critical task, my mail spool is hosed right now, and needs to be fixed. fsck didn't magically finish the job, it just cleaned up the lose ends. It lets your system reboot, but that isn't the same as saying, "nothing happened". fsck makes the file system consistent, but it can't complete the interrupted job. I think people forget this sometimes. I think I forget it sometimes. :) So, that IS an error. That's expected when the system goes down hard, mirror, no mirror, ccd(4), raid(4), hardware, whatever. It's going to be incomplete, and possibly badly wrong (and maybe corrupted beyond repair). Ok, let's say you are right, let's say my test is a fluke (and I'll be quick to say, YES, I am sure under some circumstances, you WILL end up with a data mismatch between disks!). Which disk is "right"? BOTH are wrong, just differently wrong. Which one becomes the "master" during the remirror? I've worked with a lot of Netware servers with SW disk mirroring, a system I consider the best SW mirroring I've seen, never figured that one out. It makes a decision, it copies one to the other. What if that decision is wrong? Well, who cares, they are BOTH wrong, pick one and move on. If the data being written when the event happens matters, you have to re-do whatever you were doing, restore from backup, back out a transaction on a TTS system, or otherwise, deal with it. That process will probably "heal" the active files on the ccd(4) set, having re-written both of them. On the other hand, if the data being written at the time of the crash is something like logs, hey, it's undesirable to lose them, but does it really matter that the two disks are different? There was a nasty event, the data is going to be wrong (or missing or .. ), regardless. The machine I was testing on is going to be my new in-house logging DNS/DHCP server. I'm using ccd(4) on the /var partition (where the logs will end up) and on the /home partition (the rest will be dumped/restored weekly). The only files that will be regularly written to are going to be log files. If I end up with an event that causes the drives to get out of sync, I really can't imagine a scenario where this causes me problems that wouldn't be just as bad without mirroring. If these logs are rotated, within a few days, I should be back to having all active files in sync. Short version: I recognize your concern. I suspect you are right, the disks could get out of sync. I was a bit concerned about this for a while myself. However, the more I think about this, the more I keep coming to the "so what?" conclusion. My three tests indicated one can't universally even demonstrate a difference in the written files, though I'd want to repeat it an infinite number more times before I say "and there never will be a difference". :) Yes, ccd(4) mirroring is not for every application. But for some, it can be useful. My above mentioned DNS/DHCP server is an example -- I'd like to keep two copies of constantly changing data. If I lose one, I'd like to have rapid repair. If I lose them both, it will not be the end of the world. I'm less likely to lose them both with ccd(4) than I am without any mirroring. This is good. It isn't worth the effort of a RAIDframe kernel to me, it isn't worth the price of an Accusys box to me. Nick. (shoulda bought a lottery ticket)