Re: Unusual drive activity attempting to install SusE

david merriman Wed, 02 Feb 2005 11:36:14 -0800

Volker Kuhlmann wrote:

I've been attempting to install SuSE 9.2 from a DVD ISO I downloaded and burnt

The computer boots from the DVD fine, but as soon as the installer gets past the language and keyboard selection to the hardware side of things, and at each stage of the install thereafter, one of my two hard drives starts churning away - brrrrrp, brrrrrp, brrrrrp - twice a second for about 10 minutes at a time, before it finally stops and the installer goes to the next step. The installer is unresponsive while the drive is churning, but responds normally when it finally stops.


I/O errors on the hard disk.

I tried disconnecting that drive, and the SuSE installer was happy as a clam, going from one step to the next instantly. Plugged the drive back in, and back to Churn City again.


Yes, no doubt about it. Good post btw, with all the info that may be
relevant.

You need to check out that drive. While it's getting stuck there must be
errors logged in syslog (during install, peek around the other consoles
- alt-f2, alt-f3 etc). Smartmontools is superb for querying the disk's
own idea about its performance. Boot from your install media and select
"rescue system". When booted, log in as root. You want the smartctl
command, though you'll have to read the man page for it somewhere other
than the rescue system (no space for docu).

Basics (for hda substitute appropriate device name):

 smartctl -a /dev/hda

 Display complete disk status. Of primary importance here are:
 Overall health status: if FAILED then it's a warranty / dustbin case
 Reallocated sectors: 1/month of disk lifetime are acceptable, sudden steep
        increase -> warranty/dustbin
 Any unreadable sectors logged in the error log section.

You may have to run

 smartctl -s on -S on -o on /dev/hda

once, or once after boot, to enable the smart feature of the disk.

You can run selftests with

 smartctl -t long /dev/hda
 smartctl -t short /dev/hda
 smartctl -t offline /dev/hda

and use -a to query their status. Don't start another test before the
previous is finished. The long one can run for an hour.

I think you will find a region of some dead sectors on this disk. If
this region increases in size -> warranty/dustbin. Or use the disk as a
study object for how bad disks behave, so you can spot it faster next
time.

Oh, to save you some time: Do *not* use badblocks to find out whether
your disk is faulty. It is simply not geared to test contemporary disk
technology in an appropriate manner. That is, on the extremely rare
occasion when it does indeed find a fault, the disk is truely stuffed.
In all other cases it doesn't see a fault while looking into the abyss,
but it takes a very long time looking. There are a few minor purposes
badblocks still serves, but its primary implied function is not one of
them.

Volker

I finally got a chance to run 'smartctl' on that drive last night.
Here's the result, after both a 'short' and 'long' test:


***************************************
[EMAIL PROTECTED] david]# smartctl -a /dev/hdf
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     ST340823A
Serial Number:    7EF1LE28
Firmware Version: 3.54
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   4
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Feb  3 01:30:37 2005 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                       was completed without error.
                                       Auto Offline Data Collection:
Enabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                       without error or no self-test
has ever
                                       been run.
Total time to complete Offline
data collection:                 ( 422) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                       Auto Offline data collection
on/off support.
                                       Suspend Offline collection upon new
                                       command.
                                       Offline surface scan supported.
                                       Self-test supported.
                                       No Conveyance Self-test supported.
                                       No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                       power-saving mode.
                                       Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                       No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  44) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
 1 Raw_Read_Error_Rate     0x000e   073   051   025    Old_age
Always       -       111928140
 3 Spin_Up_Time            0x0002   085   070   000    Old_age
Always       -       0
 4 Start_Stop_Count        0x0032   100   100   020    Old_age
Always       -       1018
 5 Reallocated_Sector_Ct   0x0032   100   100   036    Old_age
Always       -       0
 7 Seek_Error_Rate         0x000e   084   060   030    Old_age
Always       -       299427167
 9 Power_On_Hours          0x0032   088   088   000    Old_age
Always       -       10963
10 Spin_Retry_Count        0x0012   100   100   097    Old_age
Always       -       0
12 Power_Cycle_Count       0x0032   099   099   020    Old_age
Always       -       1367
194 Temperature_Celsius     0x0022   041   050   000    Old_age
Always       -       41
195 Hardware_ECC_Recovered  0x001a   065   055   000    Old_age
Always       -       17688879
197 Current_Pending_Sector  0x0012   100   100   000    Old_age
Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   100   000    Old_age
Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age
Always       -       0

SMART Error Log Version: 1
ATA Error Count: 25340 (device log contains only the most recent five
errors)
       CR = Command Register [HEX]
       FR = Features Register [HEX]
       SC = Sector Count Register [HEX]
       SN = Sector Number Register [HEX]
       CL = Cylinder Low Register [HEX]
       CH = Cylinder High Register [HEX]
       DH = Device/Head Register [HEX]
       DC = Device Command Register [HEX]
       ER = Error register [HEX]
       ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 25340 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
 When the command that caused the error occurred, the device was active
or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 16 5b b5 a8 f4  Error: IDNF at LBA = 0x04a8b55b = 78165339

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 00 16 5b b5 a8 f4 00      00:08:43.658  READ SECTOR(S)
 20 00 16 5b b5 a8 f4 00      00:08:43.166  READ SECTOR(S)
 20 00 17 5a b5 a8 f4 00      00:08:42.702  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:42.702  RECALIBRATE [OBS-4]
 91 00 3f 3f ff 3f ff 00      00:08:42.702  INITIALIZE DEVICE
PARAMETERS [OBS-6]

Error 25339 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
 When the command that caused the error occurred, the device was active
or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 16 5b b5 a8 f4  Error: IDNF at LBA = 0x04a8b55b = 78165339

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 00 16 5b b5 a8 f4 00      00:08:43.166  READ SECTOR(S)
 20 00 17 5a b5 a8 f4 00      00:08:42.702  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:42.702  RECALIBRATE [OBS-4]
 91 00 3f 3f ff 3f ff 00      00:08:42.702  INITIALIZE DEVICE
PARAMETERS [OBS-6]
 00 00 00 00 00 00 00 06      00:08:42.663  NOP [Abort queued commands]

Error 25338 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
 When the command that caused the error occurred, the device was active
or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 17 5a b5 a8 f4  Error: IDNF at LBA = 0x04a8b55a = 78165338

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 00 17 5a b5 a8 f4 00      00:08:42.702  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:42.702  RECALIBRATE [OBS-4]
 91 00 3f 3f ff 3f ff 00      00:08:42.702  INITIALIZE DEVICE
PARAMETERS [OBS-6]
 00 00 00 00 00 00 00 06      00:08:42.663  NOP [Abort queued commands]
 20 00 17 5a b5 a8 f4 00      00:08:42.231  READ SECTOR(S)

Error 25337 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
 When the command that caused the error occurred, the device was active
or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 17 5a b5 a8 f4  Error: IDNF at LBA = 0x04a8b55a = 78165338

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 00 17 5a b5 a8 f4 00      00:08:42.231  READ SECTOR(S)
 20 00 17 5a b5 a8 f4 00      00:08:41.727  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:41.727  RECALIBRATE [OBS-4]
 20 00 17 5a b5 a8 f4 00      00:08:41.262  READ SECTOR(S)
 20 00 17 5a b5 a8 f4 00      00:08:40.768  READ SECTOR(S)

Error 25336 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
 When the command that caused the error occurred, the device was active
or idle.

 After command completion occurred, registers were:
 ER ST SC SN CL CH DH
 -- -- -- -- -- -- --
 10 51 17 5a b5 a8 f4  Error: IDNF at LBA = 0x04a8b55a = 78165338

 Commands leading to the command that caused the error were:
 CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
 -- -- -- -- -- -- -- --  ----------------  --------------------
 20 00 17 5a b5 a8 f4 00      00:08:41.727  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:41.727  RECALIBRATE [OBS-4]
 20 00 17 5a b5 a8 f4 00      00:08:41.262  READ SECTOR(S)
 20 00 17 5a b5 a8 f4 00      00:08:40.768  READ SECTOR(S)
 10 00 3f 00 00 00 f0 00      00:08:40.768  RECALIBRATE [OBS-4]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%
10963         -
# 2  Short offline       Completed without error       00%
10962         -

Device does not support Selective Self Tests/Logging
[EMAIL PROTECTED] david]#
***************************************

If I'm reading those results correctly (a lot of errors, a bunch of
reallocated sectors, and a bunch of uncorrectable errors), they indicate
(to me, at least) that the drive is somewhat suspect, but not fatally
so.  I stand to be corrected, though.

I'm intending to wipe everything and start over clean, to get rid of
legacy junk that I no longer need (such as Windows ;) ), and I'm
thinking I might just remove that drive altogether since it's
interfering with things, and just use the Maxtor.

"160GB should be enough for anybody..."

Thanks for the help.

David

The pen is mightier than the sword, but only if the sword is very small
and the pen is very sharp.

Re: Unusual drive activity attempting to install SusE

Reply via email to