Laurence Perkins wrote:
>> -----Original Message-----
>> From: Dale <rdalek1...@gmail.com> 
>> Sent: Tuesday, April 12, 2022 10:08 AM
>> To: gentoo-user@lists.gentoo.org
>> Subject: Re: [gentoo-user] Hard drive error from SMART
>>
>> Rich Freeman wrote:
>>> On Mon, Apr 11, 2022 at 9:27 PM Dale <rdalek1...@gmail.com> wrote:
>>>> Thoughts.  Replace as soon as drive arrives or wait and see?
>>>>
>>> So, first of all just about all my hard drives are in a RAID at this 
>>> point, so I have a higher tolerance for issues.
>>>
>>> If a drive is under warranty I'll usually try to see if they will RMA 
>>> it.  More often than not they will, and in that case there is really 
>>> no reason not to.  I'll do advance shipping and replace the drive 
>>> before sending the old one back so that I mostly have redundancy the 
>>> whole time.
>>>
>>> If it isn't under warranty then I'll scrub it and see what happens.
>>> I'll of course do SMART self-tests, but usually an error like this 
>>> won't actually clear until you overwrite the offline sector so that 
>>> the drive can reallocate it.  A RAID scrub/resilver/etc will overwrite 
>>> the sector with the correct contents which will allow this to happen.
>>> (Otherwise there is no way for the drive to recover - if it knew what 
>>> was stored there it wouldn't have an error in the first place.)
>>>
>>> If an error comes back then I'll replace the drive.  My drives are 
>>> pretty large at this point so I don't like keeping unreliable drives 
>>> around.  It just increases the risk of double failures, given that a 
>>> large hard drive can take more than a day to replace.  Write speeds 
>>> just don't keep pace with capacities.  I do have offline backups but I 
>>> shudder at the thought of how long one of those would take to restore.
>>>
>>
>> Sadly, I don't have RAID here but to be honest, I really need to have it 
>> given the data and my recent luck with hard drives.  Drives used to get 
>> dumped because they were just to small to use anymore.  Nowadays, they seem 
>> to break in some fashion long before their usefulness ends their lives. 
>>
>> I remounted the drives and did a backup.  For anyone running up on this, 
>> just in case one of the files got corrupted, I used a little trick to see if 
>> I can figure out which one may be bad if any.  I took my rsync commands from 
>> my little script and ran them one at a time with --dry-run added.  If a file 
>> was to be updated on the backup that I hadn't changed or added, I was going 
>> to check into it before updating my backups.  It could be that the backup 
>> file was still good and the file on my drive reporting problems was bad.  In 
>> that case, I would determine which was good and either restore it from 
>> backups or allow it to be updated if needed.  Either way, I should have a 
>> good file since the drive claims to have fixed the problem.  Now let us 
>> pray.  :-D 
>>
>> Drive isn't under warranty.  I may have to start buying new drives from 
>> dealers.  Sometimes I find drives that are pulled from systems and have very 
>> few hours on them.  Still, warranty may not last long.  Saves a lot of money 
>> tho. 
>>
>> USPS claims drive is on the way.  Left a distribution point and should 
>> update again when it gets close.  First said Saturday, then said Friday.  I 
>> think Friday is about right but if the wind blows right, maybe Thursday. 
>>
>> I hope I have another port and power cable plug for the swap out.  At least 
>> now, I can unmount it and swap without a lot of rebooting.  Since it's on 
>> LVM, that part is easy.  Regretfully I have experience on that process.  :/
>>
>> Thanks to all. 
>>
>> Dale
>>
>> :-)  :-) 
>>
>>
> You can get up to 16X SATA PCI-e cards these days for pretty cheap.  So as 
> long as you have the power to run another drive or two there's not much 
> reason not to do RAID on the important stuff.  Also, the SATA protocol allows 
> for port expanders, which are also pretty cheap.  
>
> One of my favorite things about BTRFS is the data checksums.  If the drive 
> returns garbage, it turns into a read error.  Also, if you can't do real 
> RAID, but have excess space you can tell it to keep two copies of everything. 
>  Doesn't help with total drive failure, but does protect against the 
> occasional failed sector.  If you don't mind writes taking twice as long 
> anyway.
>
> LMP


I looked into a card a good while back and they were pretty pricey at
the time.  You happen to have some search terms I can search for on
ebay, Amazon etc?  I know some chipsets work better on Linux out of the
box.  I don't need to buy one that doesn't work or only works with the
threat of a sledge hammer.  lol  I've also looked into that other thing,
SAS? or something.  It's been a while tho. 

I'm pretty good at doing backups.  I do Gentoo updates on Saturday, and
sometimes Sunday.  While the updates are downloading, I update my
backups.  It's almost like a religion for me.  I was just more cautious
earlier.  I suspect a file could be corrupted somewhere but wanted to be
sure it wasn't something important.  I have some files that if lost, I
may not can download again.  They don't exist.  A few I got from some
Govt archive that are really old but since removed, or at least I can't
find them anymore. 

I've given serious thought to switching to BTRFS.  Thing is, I'm still
trying to get LVM figured out.  Plus, LVM is well maintained and should
be for a good long while, plus it works for me.  Still, if I could
afford to have several new drives all at once, I'd certainly play with
it.  It could very well be better.  The one thing I wish, LVM had a GUI
where you could do everything from it.  During my recent rearrangement
of drives, I learned that you can't do a lot of things within webmin. 
It does some things but not everything.  Plus, you have to have a
running GUI to use it.  In that case, I had to unmount /home which meant
no KDE, so no Webmin either.  Still, that could cause trouble too.  I
dunno. 

Thanks.

Dale

:-)  :-)

Reply via email to