> > from my understanding of how google do things, loosing a drive just > > means they need to replace it. so it's cheeper to let drives fail. > > on the other hand, we have our main filesystem raided on an aoe > > appliance. suppose that one of those raids has two disks showing > > a smart status of "will fail". in this case i want to know the > > elevated > > risk and i will allocate a spare drive to replace at least one of the > > drives. > > > > i guess this is the long way of saying, it all depends on how painful > > loosing your data might be. if it's painful enough, even a poor tool > > like smart is better than nothing. > > > I agree (plus I was just wrong about SMART at first), though I do > think your example above is about preventing downtime, not so much > data loss (Even without smart entirely, and all the disks come up > corrupt, we're all backed up within some acceptable window, right?)
i don't know. if you lean that direction, then the only thing raid gives you is reduced downtime. i think of raid as reliable storage. backups are for saving one's bacon in the face of other disasters. you know, sysadmin mistakes, misconfiguration, code gone wild, building burns down — disaster recovery. (and if my experience with backups is any indiciation, it's best not to rely on them.) but this thinking is probablly specific to how i use raid. i imagine the exact answer on what raid gives you should be worked out based on the application. for linux-type filesystems, e.g., raid won't save your accidently deleted files. - erik
