Greg Oster wrote:
...
> Here's what I'd encourage you (or anyone else) to do:

actually, I'd encourage you do try your own test.  Results were interesting.

> 1) Create a ccd as you describe in the HOWTO and mount the filesystem.

used my own instructions, if you don't mind. :)
Softdeps on.  That may matter.  Or it may not.  Not sure.

> 2) Start extracting 5 copies of src.tar.gz onto the filesystem (
> simultanously is preferred, but basically anything that will generate 
> a lot of IO here is what is needed).

I wussed out here.  Did one unpacking of a Maildir in a .tgz file.  But
lots of IO, lots of thrashing, disks were basically saturated with work,
processor was waiting for disk.  Lots of tiny files.  On the other hand,
that's a lot more activity than this machine will ever see in production.

My first (and second) test was copying the 86M .tgz file, but that was
horribly uninteresting.  Resetting the machine well into the copy
resulted in a zero-byte file after fsck.  Truncated.  Not a big
surprise, really.

> 3) After that's been going for a while, and while still in progress, 
> pull the power from the machine.

Drop power mid write, you are risking your disk.  Yes, I have spiked
disks with a nail gun to test RAID in the past, but didn't feel like
possibly toasting two disks by powering down the machine mid-write at
this time.  This system has purpose for me. :)

So, I hit the reset button on the machine.  That should give something
similar to (though admittedly, not identical to) a crash.

No, hitting the reset is NOT the same as a power outage.  It isn't the
same as a crash either -- in the later case, I'm going to say that it is
just different, not easier or harder...so my test is only one kind of
failure (and I REALLY didn't feel like pulling a memory module out to
simulate a HW failure... :)

> 4) Fire the machine back up, configure the ccd again, and run fsck a 
>    few times to make sure the ccd filesystem is "clean".

once did the job.  Second fsck came up clean.  Don't expect different
results on the third or fourth...

> 5) Now unconfigure the ccd.

mounted each separately as a non-mirrored ccd file system.

> 6) Do an md5 checksum of each of the parts of the mirror, and see if 
> they differ.  (they shouldn't, but I bet the do!!)

I think the md5 test of the mirror elements is bogus here.
I don't care if an unallocated block is different. I care if the files
are different.  I might not even care about that much.  See below...

> If they differ, tell me how ccd detected that difference, and how it 
> warned you that if the primary drive died that you'd have incorrect 
> data.  If they don't differ, go buy a lottery ticket, cause it's
> your lucky day! ;) 

I used diff(1) to compare the two trees created by splitting the mirror.

No difference found.  i.e., ccd(4) mirroring passed a somewhat
simplified version of your test.  I even modified one of the files to
make sure I didn't blow the diff command usage...  188M of files in the
tree, no differences.

I will admit I was pleasantly surprised, though not totally shocked that
it did.

My first clue was what happened when I tried to interrupt the copy of a
single very large file to the ccd(4) file system.  Even though many
megabytes had been transfered, by the time fsck got finished, the file
had been truncated to zero bytes (this test was repeated twice, same
results each time).  Zero byte files tend to match pretty well. :)

I haven't looked closely at the code, but I rather suspect that the
ccd(4) code sends the same data out to both disks at very close to the
same time, without wandering off to do other things in between.  In
order for things to get out of sync, the "event" would have to happen
between the time data was sent to the first disk and before it got sent
to the second.  I'm not sure, but I suspect there are relatively few
times you will get a software crash that would cause that (yes, your
disk IO code could crash, but I suspect if that was prone to happening,
you have much bigger problems on your hands!).  However, that doesn't
cover power outages, HW failure, or careless hitting of the reset button.

But let's think about this a moment...

The file system IS wrong.  I was untaring a big .tgz file, and what is
on the file system does not match what was in the .tgz file, as it
hadn't finished!  If that was a critical task, my mail spool is hosed
right now, and needs to be fixed.  fsck didn't magically finish the job,
it just cleaned up the lose ends.  It lets your system reboot, but that
isn't the same as saying, "nothing happened".  fsck makes the file
system consistent, but it can't complete the interrupted job.  I think
people forget this sometimes.  I think I forget it sometimes. :)

So, that IS an error.  That's expected when the system goes down hard,
mirror, no mirror, ccd(4), raid(4), hardware, whatever.  It's going to
be incomplete, and possibly badly wrong (and maybe corrupted beyond
repair).  Ok, let's say you are right, let's say my test is a fluke (and
I'll be quick to say, YES, I am sure under some circumstances, you WILL
end up with a data mismatch between disks!).  Which disk is "right"?
BOTH are wrong, just differently wrong.  Which one becomes the "master"
during the remirror?  I've worked with a lot of Netware servers with SW
disk mirroring, a system I consider the best SW mirroring I've seen,
never figured that one out.  It makes a decision, it copies one to the
other.  What if that decision is wrong?  Well, who cares, they are BOTH
wrong, pick one and move on.

If the data being written when the event happens matters, you have to
re-do whatever you were doing, restore from backup, back out a
transaction on a TTS system, or otherwise, deal with it.  That process
will probably "heal" the active files on the ccd(4) set, having
re-written both of them.

On the other hand, if the data being written at the time of the crash is
something like logs, hey, it's undesirable to lose them, but does it
really matter that the two disks are different?  There was a nasty
event, the data is going to be wrong (or missing or .. ), regardless.


The machine I was testing on is going to be my new in-house logging
DNS/DHCP server.  I'm using ccd(4) on the /var partition (where the logs
will end up) and on the /home partition (the rest will be
dumped/restored  weekly).  The only files that will be regularly written
to are going to be log files.  If I end up with an event that causes the
drives to get out of sync, I really can't imagine a scenario where this
causes me problems that wouldn't be just as bad without mirroring.  If
these logs are rotated, within a few days, I should be back to having
all active files in sync.


Short version:
I recognize your concern.  I suspect you are right, the disks could get
out of sync.  I was a bit concerned about this for a while myself.
However, the more I think about this, the more I keep coming to the "so
what?" conclusion.  My three tests indicated one can't universally even
demonstrate a difference in the written files, though I'd want to repeat
it an infinite number more times before I say "and there never will be a
difference". :)

Yes, ccd(4) mirroring is not for every application.  But for some, it
can be useful.  My above mentioned DNS/DHCP server is an example -- I'd
like to keep two copies of constantly changing data.  If I lose one, I'd
like to have rapid repair.  If I lose them both, it will not be the end
of the world.  I'm less likely to lose them both with ccd(4) than I am
without any mirroring.  This is good.  It isn't worth the effort of a
RAIDframe kernel to me, it isn't worth the price of an Accusys box to me.


Nick.
(shoulda bought a lottery ticket)

Reply via email to