Volker Kuhlmann wrote:
I've been attempting to install SuSE 9.2 from a DVD ISO I downloaded and
burnt
The computer boots from the DVD fine, but as soon as the installer gets
past the language and keyboard selection to the hardware side of things,
and at each stage of the install thereafter, one of my two hard drives
starts churning away - brrrrrp, brrrrrp, brrrrrp - twice a second for
about 10 minutes at a time, before it finally stops and the installer
goes to the next step. The installer is unresponsive while the drive is
churning, but responds normally when it finally stops.
I/O errors on the hard disk.
I tried disconnecting that drive, and the SuSE installer was happy as a
clam, going from one step to the next instantly. Plugged the drive back
in, and back to Churn City again.
Yes, no doubt about it. Good post btw, with all the info that may be
relevant.
You need to check out that drive. While it's getting stuck there must be
errors logged in syslog (during install, peek around the other consoles
- alt-f2, alt-f3 etc). Smartmontools is superb for querying the disk's
own idea about its performance. Boot from your install media and select
"rescue system". When booted, log in as root. You want the smartctl
command, though you'll have to read the man page for it somewhere other
than the rescue system (no space for docu).
Basics (for hda substitute appropriate device name):
smartctl -a /dev/hda
Display complete disk status. Of primary importance here are:
Overall health status: if FAILED then it's a warranty / dustbin case
Reallocated sectors: 1/month of disk lifetime are acceptable, sudden steep
increase -> warranty/dustbin
Any unreadable sectors logged in the error log section.
You may have to run
smartctl -s on -S on -o on /dev/hda
once, or once after boot, to enable the smart feature of the disk.
You can run selftests with
smartctl -t long /dev/hda
smartctl -t short /dev/hda
smartctl -t offline /dev/hda
and use -a to query their status. Don't start another test before the
previous is finished. The long one can run for an hour.
I think you will find a region of some dead sectors on this disk. If
this region increases in size -> warranty/dustbin. Or use the disk as a
study object for how bad disks behave, so you can spot it faster next
time.
Oh, to save you some time: Do *not* use badblocks to find out whether
your disk is faulty. It is simply not geared to test contemporary disk
technology in an appropriate manner. That is, on the extremely rare
occasion when it does indeed find a fault, the disk is truely stuffed.
In all other cases it doesn't see a fault while looking into the abyss,
but it takes a very long time looking. There are a few minor purposes
badblocks still serves, but its primary implied function is not one of
them.
Volker
I finally got a chance to run 'smartctl' on that drive last night.
Here's the result, after both a 'short' and 'long' test:
***************************************
[EMAIL PROTECTED] david]# smartctl -a /dev/hdf
smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: ST340823A
Serial Number: 7EF1LE28
Firmware Version: 3.54
Device is: In smartctl database [for details use: -P show]
ATA Version is: 4
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Feb 3 01:30:37 2005 NZDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 422) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 44) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000e 073 051 025 Old_age
Always - 111928140
3 Spin_Up_Time 0x0002 085 070 000 Old_age
Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always - 1018
5 Reallocated_Sector_Ct 0x0032 100 100 036 Old_age
Always - 0
7 Seek_Error_Rate 0x000e 084 060 030 Old_age
Always - 299427167
9 Power_On_Hours 0x0032 088 088 000 Old_age
Always - 10963
10 Spin_Retry_Count 0x0012 100 100 097 Old_age
Always - 0
12 Power_Cycle_Count 0x0032 099 099 020 Old_age
Always - 1367
194 Temperature_Celsius 0x0022 041 050 000 Old_age
Always - 41
195 Hardware_ECC_Recovered 0x001a 065 055 000 Old_age
Always - 17688879
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always - 0
200 Multi_Zone_Error_Rate 0x0000 100 100 000 Old_age
Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age
Always - 0
SMART Error Log Version: 1
ATA Error Count: 25340 (device log contains only the most recent five
errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 25340 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 16 5b b5 a8 f4 Error: IDNF at LBA = 0x04a8b55b = 78165339
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 00 16 5b b5 a8 f4 00 00:08:43.658 READ SECTOR(S)
20 00 16 5b b5 a8 f4 00 00:08:43.166 READ SECTOR(S)
20 00 17 5a b5 a8 f4 00 00:08:42.702 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:42.702 RECALIBRATE [OBS-4]
91 00 3f 3f ff 3f ff 00 00:08:42.702 INITIALIZE DEVICE
PARAMETERS [OBS-6]
Error 25339 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 16 5b b5 a8 f4 Error: IDNF at LBA = 0x04a8b55b = 78165339
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 00 16 5b b5 a8 f4 00 00:08:43.166 READ SECTOR(S)
20 00 17 5a b5 a8 f4 00 00:08:42.702 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:42.702 RECALIBRATE [OBS-4]
91 00 3f 3f ff 3f ff 00 00:08:42.702 INITIALIZE DEVICE
PARAMETERS [OBS-6]
00 00 00 00 00 00 00 06 00:08:42.663 NOP [Abort queued commands]
Error 25338 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 17 5a b5 a8 f4 Error: IDNF at LBA = 0x04a8b55a = 78165338
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 00 17 5a b5 a8 f4 00 00:08:42.702 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:42.702 RECALIBRATE [OBS-4]
91 00 3f 3f ff 3f ff 00 00:08:42.702 INITIALIZE DEVICE
PARAMETERS [OBS-6]
00 00 00 00 00 00 00 06 00:08:42.663 NOP [Abort queued commands]
20 00 17 5a b5 a8 f4 00 00:08:42.231 READ SECTOR(S)
Error 25337 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 17 5a b5 a8 f4 Error: IDNF at LBA = 0x04a8b55a = 78165338
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 00 17 5a b5 a8 f4 00 00:08:42.231 READ SECTOR(S)
20 00 17 5a b5 a8 f4 00 00:08:41.727 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:41.727 RECALIBRATE [OBS-4]
20 00 17 5a b5 a8 f4 00 00:08:41.262 READ SECTOR(S)
20 00 17 5a b5 a8 f4 00 00:08:40.768 READ SECTOR(S)
Error 25336 occurred at disk power-on lifetime: 10867 hours (452 days +
19 hours)
When the command that caused the error occurred, the device was active
or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 17 5a b5 a8 f4 Error: IDNF at LBA = 0x04a8b55a = 78165338
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
20 00 17 5a b5 a8 f4 00 00:08:41.727 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:41.727 RECALIBRATE [OBS-4]
20 00 17 5a b5 a8 f4 00 00:08:41.262 READ SECTOR(S)
20 00 17 5a b5 a8 f4 00 00:08:40.768 READ SECTOR(S)
10 00 3f 00 00 00 f0 00 00:08:40.768 RECALIBRATE [OBS-4]
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00%
10963 -
# 2 Short offline Completed without error 00%
10962 -
Device does not support Selective Self Tests/Logging
[EMAIL PROTECTED] david]#
***************************************
If I'm reading those results correctly (a lot of errors, a bunch of
reallocated sectors, and a bunch of uncorrectable errors), they indicate
(to me, at least) that the drive is somewhat suspect, but not fatally
so. I stand to be corrected, though.
I'm intending to wipe everything and start over clean, to get rid of
legacy junk that I no longer need (such as Windows ;) ), and I'm
thinking I might just remove that drive altogether since it's
interfering with things, and just use the Maxtor.
"160GB should be enough for anybody..."
Thanks for the help.
David
The pen is mightier than the sword, but only if the sword is very small
and the pen is very sharp.