Re: [gentoo-user] smartctrl drive error @60%

Dale Tue, 01 Jul 2014 00:22:30 -0700

J. Roeleveld wrote:
> On Tuesday, July 01, 2014 06:52:10 AM Mick wrote:
>> On Sunday 29 Jun 2014 13:05:04 Rich Freeman wrote:
>>> On Sun, Jun 29, 2014 at 12:44 AM, Dale <rdalek1...@gmail.com> wrote:
>>>> What if I copied data to the drive until it was just about full.  I'm
>>>> thinking like maybe 90 or 95% or so.  If I do that and run the test
>>>> every few days, would it then catch a error after a few weeks or so of
>>>> testing?  I realize no one knows with 100% certainty...
>>> As you already said, nobody knows with 100% certainty.
>>>
>>> In the failures I've experienced I'd expect it to start catching
>>> errors within a few days.  However, on those drives the relocated
>>> sector count never increases, which suggests that the firmware never
>>> relocated those sectors when overwritten, which seems brain-dead to
>>> me.
>>>
>>> If the drive relocates the sectors, then conceivably it could go quite
>>> a long time until having errors, probably in an entirely different set
>>> of sectors.
>>>
>>> Even if it doesn't relocate, the reliability of the bad sectors could
>>> be high or low.
>>>
>>> Rich
>> What triggers a relocation?  I also have a drive which shows a sector
>> relocation pending, but for a few days now and after some tests that showed
>> no errors, it won't relocate it.
> I think a write to that sector should force a relocation.
>
> --
> Joost
>
>


I think you are right Joost.  I should have tried some fixes that COULD
be destructive to see if a) it fixes it and b) the data lives, other
than the bad part at least.  I forgot to do that and really wasn't sure
how to do it either.  One person posted a lot of info about it but it
was a bit deep for me.  It would have required some reading and because
of health issues, I can't tackle that much at one time right now. 

What I did tho.  I got the new drive, rsynced the data from old drive to
new drive.  Removed the LVM stuff from the old drive.  I used dd to
erase the whole old drive, which took a while for 3TBs.  o_O  After
that, I ran the test.  It came back fine.  Check out this snippet:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%    
16499         -
# 2  Extended offline    Completed without error       00%    
16498         -
# 3  Short offline       Completed without error       00%    
16475         -
# 4  Extended offline    Completed without error       00%    
16466         -
# 5  Extended offline    Aborted by host               90%    
16461         -
# 6  Extended offline    Completed: read failure       60%    
16451         2905482560
# 7  Extended offline    Completed: read failure       60%    
16432         2905482560
# 8  Extended offline    Completed: read failure       60%    
16427         2905482560
# 9  Extended offline    Completed: read failure       60%    
16394         2905482560
#10  Extended offline    Completed: read failure       60%    
16389         2905482560
#11  Short offline       Completed without error       00%    
16380         -
#12  Extended offline    Completed: read failure       60%    
16365         2905482560
#13  Extended offline    Completed: read failure       60%    
16352         2905482560
#14  Extended offline    Completed without error       00%     
8044         -
#15  Extended offline    Completed without error       00%     
3121         -
#16  Extended offline    Completed without error       00%     
1548         -
#17  Short offline       Completed without error       00%     
1141         -
#18  Extended offline    Completed without error       00%      
719         -
#19  Extended offline    Completed without error       00%      
525         -
#20  Short offline       Completed without error       00%      
516         -
#21  Extended offline    Completed without error       00%       
18         -
7 of 7 failed self-tests are outdated by newer successful extended
offline self-test # 2
 
Note the very last line.  You can see all the failures but the last line
says the drive is good to go since the drive passed after the bad ones. 
So, while I'm not holding my breath, that is what SMART says.  It may
blow smoke and make horrible noises next week but right now, it says it
is OK. 

In the end, it seems something has to write to that specific sector and
then the drive will reallocate/move/whatever so that the bad part isn't
used anymore.  It seems dd did that but I bet there are other tools that
could do it without losing data other than what is in the bad spot of
course.  That's my simple idea at least. 

Hope that helps.  I wish I could have done the other stuff and kept
notes on commands and such and then post the results.  That MAY have
helped someone in the future.  My brain ain't what it used to be.  ;-)

Dale

:-)  :-)

Re: [gentoo-user] smartctrl drive error @60%

Reply via email to