Laurence Perkins wrote: >> -----Original Message----- >> From: Dale <rdalek1...@gmail.com> >> Sent: Tuesday, April 12, 2022 10:08 AM >> To: gentoo-user@lists.gentoo.org >> Subject: Re: [gentoo-user] Hard drive error from SMART >> >> Rich Freeman wrote: >>> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1...@gmail.com> wrote: >>>> Thoughts. Replace as soon as drive arrives or wait and see? >>>> >>> So, first of all just about all my hard drives are in a RAID at this >>> point, so I have a higher tolerance for issues. >>> >>> If a drive is under warranty I'll usually try to see if they will RMA >>> it. More often than not they will, and in that case there is really >>> no reason not to. I'll do advance shipping and replace the drive >>> before sending the old one back so that I mostly have redundancy the >>> whole time. >>> >>> If it isn't under warranty then I'll scrub it and see what happens. >>> I'll of course do SMART self-tests, but usually an error like this >>> won't actually clear until you overwrite the offline sector so that >>> the drive can reallocate it. A RAID scrub/resilver/etc will overwrite >>> the sector with the correct contents which will allow this to happen. >>> (Otherwise there is no way for the drive to recover - if it knew what >>> was stored there it wouldn't have an error in the first place.) >>> >>> If an error comes back then I'll replace the drive. My drives are >>> pretty large at this point so I don't like keeping unreliable drives >>> around. It just increases the risk of double failures, given that a >>> large hard drive can take more than a day to replace. Write speeds >>> just don't keep pace with capacities. I do have offline backups but I >>> shudder at the thought of how long one of those would take to restore. >>> >> >> Sadly, I don't have RAID here but to be honest, I really need to have it >> given the data and my recent luck with hard drives. Drives used to get >> dumped because they were just to small to use anymore. Nowadays, they seem >> to break in some fashion long before their usefulness ends their lives. >> >> I remounted the drives and did a backup. For anyone running up on this, >> just in case one of the files got corrupted, I used a little trick to see if >> I can figure out which one may be bad if any. I took my rsync commands from >> my little script and ran them one at a time with --dry-run added. If a file >> was to be updated on the backup that I hadn't changed or added, I was going >> to check into it before updating my backups. It could be that the backup >> file was still good and the file on my drive reporting problems was bad. In >> that case, I would determine which was good and either restore it from >> backups or allow it to be updated if needed. Either way, I should have a >> good file since the drive claims to have fixed the problem. Now let us >> pray. :-D >> >> Drive isn't under warranty. I may have to start buying new drives from >> dealers. Sometimes I find drives that are pulled from systems and have very >> few hours on them. Still, warranty may not last long. Saves a lot of money >> tho. >> >> USPS claims drive is on the way. Left a distribution point and should >> update again when it gets close. First said Saturday, then said Friday. I >> think Friday is about right but if the wind blows right, maybe Thursday. >> >> I hope I have another port and power cable plug for the swap out. At least >> now, I can unmount it and swap without a lot of rebooting. Since it's on >> LVM, that part is easy. Regretfully I have experience on that process. :/ >> >> Thanks to all. >> >> Dale >> >> :-) :-) >> >> > You can get up to 16X SATA PCI-e cards these days for pretty cheap. So as > long as you have the power to run another drive or two there's not much > reason not to do RAID on the important stuff. Also, the SATA protocol allows > for port expanders, which are also pretty cheap. > > One of my favorite things about BTRFS is the data checksums. If the drive > returns garbage, it turns into a read error. Also, if you can't do real > RAID, but have excess space you can tell it to keep two copies of everything. > Doesn't help with total drive failure, but does protect against the > occasional failed sector. If you don't mind writes taking twice as long > anyway. > > LMP
I looked into a card a good while back and they were pretty pricey at the time. You happen to have some search terms I can search for on ebay, Amazon etc? I know some chipsets work better on Linux out of the box. I don't need to buy one that doesn't work or only works with the threat of a sledge hammer. lol I've also looked into that other thing, SAS? or something. It's been a while tho. I'm pretty good at doing backups. I do Gentoo updates on Saturday, and sometimes Sunday. While the updates are downloading, I update my backups. It's almost like a religion for me. I was just more cautious earlier. I suspect a file could be corrupted somewhere but wanted to be sure it wasn't something important. I have some files that if lost, I may not can download again. They don't exist. A few I got from some Govt archive that are really old but since removed, or at least I can't find them anymore. I've given serious thought to switching to BTRFS. Thing is, I'm still trying to get LVM figured out. Plus, LVM is well maintained and should be for a good long while, plus it works for me. Still, if I could afford to have several new drives all at once, I'd certainly play with it. It could very well be better. The one thing I wish, LVM had a GUI where you could do everything from it. During my recent rearrangement of drives, I learned that you can't do a lot of things within webmin. It does some things but not everything. Plus, you have to have a running GUI to use it. In that case, I had to unmount /home which meant no KDE, so no Webmin either. Still, that could cause trouble too. I dunno. Thanks. Dale :-) :-)