Re: [gentoo-user] dying hard drive

2010-07-22 Thread Mick
On Thursday 22 July 2010 05:14:08 David Relson wrote:
 /var/log/messages has indicated a slew of XFS problems on an external
 USB hard drive (see attachment).  These look pretty fatal.  Anybody
 think the file system is recoverable?

You'll have to try to recover it, to see if it is possible:  xfs is vulnerable 
to power interruptions, so a faulty USB cable can cause corruption.  I haven't 
had a corrupted xfs system for years now, so I put initial experiences down to 
early (buggy) versions of the drivers.  In my case, I was not able to recover 
and I had to reformat and start again.  After a couple of early mortality 
cases the fs in question carried on for 4 years without a single problem.

Try xfs_check and xfs_repair with the drive unmounted, but first use 
xfs_dump/restore or dd to make a back up just in case.

 Also, palimpsest is reporting (graphically) that my external hard drive is
 about to die.  Can I save it's report to a text file???

Sorry, can't help with that because I'm not familiar with the application.  
You could use sys-apps/smartmontools if you want a console application that 
you can copy and paste from.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] dying hard drive

2010-07-22 Thread Paul Hartman
On Thu, Jul 22, 2010 at 1:11 AM, Mick michaelkintz...@gmail.com wrote:
 On Thursday 22 July 2010 05:14:08 David Relson wrote:
 /var/log/messages has indicated a slew of XFS problems on an external
 USB hard drive (see attachment).  These look pretty fatal.  Anybody
 think the file system is recoverable?

 You'll have to try to recover it, to see if it is possible:  xfs is vulnerable
 to power interruptions, so a faulty USB cable can cause corruption.

I had exactly this problem with a USB HDD formatted with xfs. The USB
cable that it came with was rubbish... the drive would disconnect 
reconnect on its own for no apparent reason, and corruption happened
of course. I replaced it with another cable and it worked fine after
that.

A few months later the power supply started to fail, it would
occasionally not provide enough power and the drive would go offline
or start beeping/clicking. At first I thought the disk was bad
(clicking is never good) but it was actually the sound of the drive
trying to spin up and failing. Eventually the power brick couldn't
even spin up the drive at all. I replaced the power supply and now the
drive works fine again, for now...



Re: [gentoo-user] dying hard drive?

2006-01-19 Thread matthew . garman
On Fri, Jan 13, 2006 at 06:15:20PM -0700, Richard Fish wrote:
 I was able to resurrect a drive with a similar problem with:
 dd if=/dev/zero of=/dev/hda bs=32k
 You can then check that the drive is working with:
 dd if=/dev/hda of=/dev/null bs=32k
 
 If either command fails, then it is time to replace the drive.  In
 my case, that drive was still working perfectly 18 months later
 when I sold it to someone else.

I don't think that's going to work for me:

# dd if=/dev/zero of=/dev/hda bs=32k
dd: writing `/dev/hda': No space left on device
4884091+0 records in
4884090+0 records out

# dd if=/dev/hda of=/dev/null bs=32k
dd: reading `/dev/hda': Input/output error
3229627+1 records in
3229627+1 records out

D'oh!

Time to find that RMA form!

Thanks for the help,
Matt

-- 
Matt Garman
email at: http://raw-sewage.net/index.php?file=email
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Tim Igoe

[EMAIL PROTECTED] wrote:


I keep getting hard drive errors in my kernel log/dmesg that have me
worried.  From /var/log/kernel/current:

Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }
   - Last output repeated 7 times -
Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
LBAsect=206696214, high=12, low=5369622, sector=206695927
Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
DataRequest Error }


 

Exactly the same message I noticed less than 1hr before my Maxtor 
DiamondMax 9 packed in just before xmas. Annoyingly my drive wouldn't 
mount the main data partition but everything else seemed in tact. I 
managed to recover all my data from the drive using dd once i had a new 
drive.


I'd recommend backing up anything thats essencial on the drive and 
preparing for it to give up the ghost.



The drive is a 160 GB PATA Samsung.  It's about two or three years
old, running 24x7 (although lightly).  The drive has three
partitions, all are ext3.

When I started seeing the above messages, I ran 


   fsck.ext3 -f -v -c -c /dev/hda?

on all three partitions.  Note that the -c flag includes the bad
blocks check.

I also ran

   smartctl -t long /dev/hda

On the drive.  Apparently, an error was found (details below).  I'm
not sure if this drive is actually dying, though, as the following
article (by the smartmontools author) suggests that one or two
errors on a drive is nothing to worry about.  Also, the SMART
overall-health self-assessment test comes back as PASSED.

   http://www.linuxjournal.com/article/6983

But the constant kernel messages, along with the error in the long
SMART test, concern me.  At this point, I'm not really sure what my
next steps should be, so I'm looking for any suggestions or advice.

Thanks!
Matt



# smartctl -a /dev/hda

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG SP1614N
Serial Number:0642J1FW903226
Firmware Version: TM100-24
User Capacity:160,041,885,696 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:Fri Jan 13 15:24:27 2006 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 245) Self-test routine in progress...
50% of test remaining.
Total time to complete Offline 
data collection: 		 (5760) seconds.

Offline data collection
capabilities:(0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off 
support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   1) minutes.

Extended self-test routine
recommended polling time:(  96) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate 0x000b   100   100   051Pre-fail  Always   - 
  1
 3 Spin_Up_Time0x0007   061   061   000Pre-fail  Always   - 
  6528
 4 Start_Stop_Count0x0032   100   100   000Old_age   Always   - 
  73
 5 Reallocated_Sector_Ct   0x0033   253   253   010Pre-fail  Always   - 
  0
 7 Seek_Error_Rate 0x000b   253   253   051Pre-fail  Always   - 
  0
 8 Seek_Time_Performance   0x0024   253   253   000Old_age   Offline  - 
  0
 9 Power_On_Half_Minutes   0x0032   098   098   000Old_age   Always   - 
  11505h+32m
10 Spin_Retry_Count   

Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Willie Wong
On Fri, Jan 13, 2006 at 03:39:46PM -0600, Penguin Lover [EMAIL PROTECTED] 
squawked:
 
 I keep getting hard drive errors in my kernel log/dmesg that have me
 worried.  From /var/log/kernel/current:
 
 Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 - Last output repeated 7 times -
 Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
 LBAsect=206696214, high=12, low=5369622, sector=206695927
 Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
 Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 

Do you run SMARTD? If you do, did it complain? 
(grep SMART /var/log/everything/*)

Usually UncorrectablError means that some spots on your harddrive is
not readable. And if it keeps complaining, it might be a sign that
something is wrong with your drive. (Of course, it could also be flaky
connectors.)

Maybe you can take a look at 
http://www.samsung.com/Products/HardDiskDrive/troubleshooting/index.htm

A lot of times you get one or two bad sectors due to environmental
issues: power blip for one, and my roommate slamming the door too hard
on his way out for another. If that is the case, most harddrive
vendors provide a diagnostic tool that allows you to map that couple
sectors to one of the backup ones on the disk. (Yes, they have a few
extra on the harddrive just for that purpose). 
 
 The drive is a 160 GB PATA Samsung.  It's about two or three years
 old, running 24x7 (although lightly).  The drive has three
 partitions, all are ext3.

snip
 
 SMART Self-test log structure revision number 1
 Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
 LBA_of_first_error
 # 1  Extended offlineCompleted: read failure   00% 11486 
 262886799
 # 2  Short offline   Completed without error   00% 11483 -

W
-- 
Statistics are like a Bikini: 
  showing interesting details but hiding the important stuff.
Sortir en Pantoufles: up 62 days, 14:29
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] dying hard drive?

2006-01-13 Thread Richard Fish
On 1/13/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 I keep getting hard drive errors in my kernel log/dmesg that have me
 worried.  From /var/log/kernel/current:

 Jan 13 11:42:31 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }
 - Last output repeated 7 times -
 Jan 13 11:42:39 [kernel] hda: dma_intr: error=0x40 { UncorrectableError }, 
 LBAsect=206696214, high=12, low=5369622, sector=206695927
 Jan 13 11:42:39 [kernel] ide: failed opcode was: unknown
 Jan 13 11:42:40 [kernel] hda: dma_intr: status=0x59 { DriveReady SeekComplete 
 DataRequest Error }

These mean the blocks are corrupt, and cannot be read.  Whatever was
on those blocks is now lost.

 On the drive.  Apparently, an error was found (details below).  I'm
 not sure if this drive is actually dying, though, as the following
 article (by the smartmontools author) suggests that one or two
 errors on a drive is nothing to worry about.  Also, the SMART
 overall-health self-assessment test comes back as PASSED.

I was able to resurrect a drive with a similar problem with:

dd if=/dev/zero of=/dev/hda bs=32k

!DANGER! the above command will destroy all data on the drive...but by
writing to those sectors you can cause the drive to remap them to
sectors reserved for that purpose.

You can then check that the drive is working with:

dd if=/dev/hda of=/dev/null bs=32k

If either command fails, then it is time to replace the drive.  In my
case, that drive was still working perfectly 18 months later when I
sold it to someone else.

In any case, time to make sure you have a good backup.

-Richard

-- 
gentoo-user@gentoo.org mailing list