On Wed, Mar 3, 2010 at 6:26 AM, Stroller <strol...@stellar.eclipse.co.uk> wrote:
>
> On 3 Mar 2010, at 14:00, Mark Knecht wrote:
>>
>> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <strol...@stellar.eclipse.co.uk>
>> wrote:
>>>
>>> There seem to have been a few people posting with filesystem corruption
>>> in
>>> the last week or two. It seems to be my turn, so I hope it isn't
>>> contagious.
>>> The cause here is quite clear - whilst rummaging in the server cupboard
>>> yesterday, power to the machine was accidentally disconnected.
>>
>> ...
>>  Sorry for your problems. I've had a rash of machine problems over
>> the last 6 weeks. No fun. I feel for you.
>>
>>  In my most recent case what looked like a simple disk corruption
>> problem was really a prelude to the drive just plain going bad. Have
>> you tried smartctl to see what it says about the drive at this point?
>>
>>  It would be even more frustrating to chroot in, do all the work,
>> think you had it fixed and then the underlying foundation of your
>> house crumbles beneath you 3 weeks from now.
>
> I don't think this is a problem. I would love to know what others think of
> the `smartctl` output:
>
>
> r...@sysresccd /root % smartctl -H /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
>  WHEN_FAILED RAW_VALUE
>  9 Power_On_Seconds        0x0012   001   001   020    Old_age   Always
> FAILING_NOW 44803h+12m+16s
>
> r...@sysresccd /root % smartctl -i /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family:     Fujitsu MPA..MPG series
> Device Model:     FUJITSU MPF3204AT
> Serial Number:    05030567
> Firmware Version: 0028
> User Capacity:    20,496,236,544 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   5
> ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
> Local Time is:    Wed Mar  3 14:14:31 2010 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> r...@sysresccd /root %
>
>
> This looks to me like smartctl is going "OMG! What an ancient drive!" - it's
> a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365),
> it's seen 5 years of active use - and that's the "marginal attribute"
> referred to.
>
> Like I said, the power plug was accidentally pulled on this drive, so I'm
> inclined to attribute the corruption only to that, not to the drive actually
> failing.
>
> The drive is in a computer that has rarely been turned off in the last
> couple of years, and is also in a warm environment, conditions which are
> ideal. I appreciate the latter seems unintuitive, but in fact studies have
> showed that drives in somewhat warm environments last longer than those that
> are cooled.
>
> That it passes the "SMART overall-health self-assessment test" suggests to
> me that it is chugging away quite happily.
>
> I would have dismissed your concerns were it not for the capitalised
> "FAILING_NOW" in the output. Like I say, I think this is just smartctl
> declaring "OMG! this drive is old!", but I open this matter to the list for
> discussion (should you wish).
>
> I think I'm actually nearly ready to migrate off this system. The power was
> actually pulled as I installed 3 new (to me) rackmount machines in the
> server cupboard - the plan is to have identical machines running RAID, so
> that in the case of ANY problems I have spares available. I have take
> nightly backups of the important data on this machine, however I'd prefer it
> to run just a couple or a few weeks longer to allow me to migrate at my own
> leisure.
>
> Stroller.

I've had two machines go bad due to hard drive problems in the last 6
weeks. One drive was 4.5 years old, the other 6 years old. I have no
experience with smart. I'm just learning about it. However it is
generated by the microcontroller in the hard drive as per the view of
the drive manufacturer so if the drive is telling you it's failing
then...

My 4.5 year failure actually stopped producing smart output somewhere
along the way before it failed. The 6 year drive I wasn't using smart
at the time so I had no data from it but it was in an environment
where the UPS went through a lot of abuse.

I sounds like you have good backups so just make sure they are good
and do what you want.

- Mark

Reply via email to