2009/1/16 Johannes Wiedersich <[email protected]>: > Davide Mancusi wrote: >> The hard disk of my 4-year-old laptop is starting to fail. I ran >> fsck.ext3 -c on my root partition yesterday and a few blocks were >> marked as damaged. The blocks contained some XFCE4 theme files, so I >> thought that reinstalling the relevant package should be enough. Now, >> however, the machine hangs every time I start powernowd. Kernel >> emergency key presses (Alt+SysRq+?) don't work and the usual log files >> don't contain any relevant information. I have tried uninstalling and >> reinstalling the powernowd package, but it didn't help; note also that >> fsck did not signal any damaged files belonging to powernowd. >> >> Can anyone help me sort this out? Could it be that fsck -c did not >> mark some blocks as damaged because I ran it with the root partition >> mounted read-only (as opposed to unmounted)? > > If your disk is dying this could mean about anything. > > Try smartctl from smartmontools package. What does it report about the > health status of your disk (after some testing)? > > Try e2fsck again to see, if it detects 'new' errors on your file system. > > I hope you have good back ups. You could try diff -r against your backup > (mounted ro). However, if your disk is damaged and loads and runs > garbled kernel stuff, you risk hosing your backup. Therefore it might be > safer to investigate by booting a rescue system from CD or usb-disk. YMMV.
Thanks for your response, Johannes. Now I'm confused. I installed smartmontools, I ran # smartctl -t long /dev/hda and I detected two bad sectors. I followed the HOWTO at [1] and reallocated the first one. (I had no idea one could recover bad sectors. I thought they were as good as gone.) Then I ran the test again to get the LBA address of the second bad block. Surprise, surprise, the test completed without problems. I also tried booting off a live CD and running e2fsck -c -c on all ext2/3 partitions. No bad blocks were detected, but one of the inode tables was heavily modified. However, even though no files related to powernowd were touched, powernowd now works again. >From the live CD I ran again # smartctl -t long /dev/hda [waited one hour] # smartctl -l selftest /dev/hda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 6264 - # 2 Extended offline Completed without error 00% 6262 - # 3 Extended offline Completed without error 00% 6259 - # 4 Short offline Completed without error 00% 6258 - # 5 Extended offline Completed: read failure 30% 6257 95245863 You can see that the last test completed without errors. However: # smartctl -A /dev/hda smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 062 Pre-fail Always - 0 2 Throughput_Performance 0x0005 105 105 040 Pre-fail Offline - 5874 3 Spin_Up_Time 0x0007 200 200 033 Pre-fail Always - 1 4 Start_Stop_Count 0x0012 096 096 000 Old_age Always - 6796 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 120 120 040 Pre-fail Offline - 36 9 Power_On_Hours 0x0012 086 086 000 Old_age Always - 6268 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1167 191 G-Sense_Error_Rate 0x000a 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 65 193 Load_Cycle_Count 0x0012 065 065 000 Old_age Always - 359932 194 Temperature_Celsius 0x0002 130 130 000 Old_age Always - 42 (Lifetime Min/Max 11/58) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 9 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 I still have Current_Pending_Sector==1 and smartd sends me an e-mail at every reboot and complains about it. What should I do? Davide [1] http://tinyurl.com/83g265 -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

