Robbert Haarman writes:
> Greg,
> 
> Again, you raise some interesting issues. I wonder how likely the 
> catastrophic failures you describe are, versus how likely it is that 
> things fail in a way where ccd actually helps you. I was hoping someone 
> else would comment on that, but that doesn't seem to have happened so 
> far.

When you do a "shutdown -r", has the system ever hung on you?  Has 
your system ever crashed/paniced/suffered a power outage?

How does ccd guarantee that the mirrors are in sync?  If it can't do 
that, then it's worse than using just a single disk, because a fsck 
is only going to look at one half of the mirror, and inconsistent 
data on the other half is not going to be touched.

> > So one thing that's still missing is a big, bold line at the top
> > that says:
> >  
> >   CCD Mirroring will eventually eat your data and you shouldn't use it!!
> 
> It's missing, because I am not at all convinced that claim is true.

I have an excercise at the end of this which may help with that...
 
> The way I see it: when you use CCD mirroring, your data is written to 
> multiple disks, rather than just one. In some situations, this won't 
> help you (all your disks die in a fire; you delete your own files; ...) 
> In some situations, this will help you (one of your disks fails, but you 
> still have correct data on others). In some situations, it is not as 
> good as other techniques (the cases you describe). It may or may not 
> still be better than no mirroring in these cases (for example, in the 
> case where one file gets corrupted, you may still have everything else 
> intact).

This is all fine-and-dandy when the system is up, everything is in 
sync, and all shutdowns are clean and all data is flushed properly, 
etc.  However: doing things "right" in the "everything is fine" case 
is only one part of the picture.  How it handles the "abnormal" cases 
is where ccd definitely has problems, and where "getting it right" 
becomes super-critical if you want to guarantee the integrity of the 
data...
 
> I definitely think that stating that CCD mirroring _will_ eat your data
> is FUD; short of bugs, CCD doesn't cause you to lose data; at worst, it 
> may not preserve data which other methods would have preserved.

At worst, it looks like it is preserving data when in fact it is not!!
If the mirror components get out of sync, you need to know about it, 
and you need to make sure you don't pretend that they are both 
correct.  Given enough time, and enough crashes/panics/unclean 
shutdowns, etc, it will lose data and it won't tell you about it.
(That it "may" eat your data is probably a better way of phrasing it, 
but "may" isn't quite strong enough to convey the seriousness of the 
situation!)

> > To promote the use of CCD Mirroring without noting the above major 
> > problems is a disservice to the novice who is likely not aware of 
> > the above failure modes.
> 
> You are right that it would be deceptive to advertize CCD mirroring as a 
> silver bullet. It would be a lie to say CCD mirroring is the best 
> mirroring method. However, my HOWTO does neither of these. It clearly 
> mentions that mirroring is no silver bullet (and that goes for _any_ 
> kind of mirroring), and that RAID is superior to CCD. The HOWTO might 
> actually not emphasize these points enough; I'll have a look at it 
> sometime and make changes if I deem them necessary.

The HOWTO nees to say that there is no mechanism for determining if 
the mirror components are in sync, and that in the event of a system 
crash/failure or write failure to either of the disks, the system 
should be taken down, and a "dd" done from the primary to the 
secondary (or secondary to primary, depending on which one the user 
guesses might be the most up-to-date).  In fact, if you had to poke 
the 'reset' button at all, you probably need to do a 'dd'... 

> > To me, until the above have satisfactory 
> > answers, the only thing the CCD Mirroring HOWTO (and the ccd(4)/
> > ccdconfig(8) man-pages!) should recommend is:
> > 
> >   Don't use CCD Mirroring -- at best, it provides a false sense of 
> >   security.  At worst, it will eat your data.  If you need mirroring 
> >   functionality, use RAIDframe.
> 
> Again, you're making bold claims. I would like if someone else could 
> comment on them.

So would I :)  

> Does CCD mirroring really provide only a false sense of 
> security? Will it really eat your data? Or is it just that it's not as 
> good as RAIDframe, but still a valuable improvement over not using any 
> mirroring at all?

It's only valuable if you can guarantee that the mirrors are in sync.
If you can't keep them in sync all the time, then at some point it 
will eat your data, and does just provide a false sense of security.

> > Really.  RAIDframe works, and it doesn't suffer from the serious 
> > problems noted above.  
> 
> Agreed. However, RAIDframe requires compiling a custom kernel. Now. And 
> when you next upgrade your system. And the next time. Until it gets 
> included in the shipped kernel. 

Doing things right is sometimes a bit more work... :)

> CCD is easy to set up (once you figure 
> out the steps) and I think it provides some protection against harddisk 
> failures.

There is *some* protection, provided one can guarantee the mirrors 
are in-sync at ccd configuration time. 

Here's what I'd encourage you (or anyone else) to do:

1) Create a ccd as you describe in the HOWTO and mount the filesystem.
2) Start extracting 5 copies of src.tar.gz onto the filesystem (
simultanously is preferred, but basically anything that will generate 
a lot of IO here is what is needed).
3) After that's been going for a while, and while still in progress, 
pull the power from the machine.
4) Fire the machine back up, configure the ccd again, and run fsck a 
   few times to make sure the ccd filesystem is "clean".
5) Now unconfigure the ccd.
6) Do an md5 checksum of each of the parts of the mirror, and see if 
they differ.  (they shouldn't, but I bet the do!!)

If they differ, tell me how ccd detected that difference, and how it 
warned you that if the primary drive died that you'd have incorrect 
data.  If they don't differ, go buy a lottery ticket, cause it's
your lucky day! ;) 

Later...

Greg Oster

Reply via email to