Hello,

Actually, I got the SMART data from Gnome's Disk Utility. It gives me the number of (relocated) bad sectors found ever.

I do have certain issues with my ATI graphics card, and given the coincidence, it might have written DMA data where it shouldn't.

But as I mentioned before, I really looked for something wrong before that crucial reboot, in all possible logs. Nothing. And still nothing after that.

Looks like a strike of bad luck.

Regards,
   Eli

On 22/09/17 18:21, Borissh1983 wrote:
I'm assuming you atcually had run smart scan to do set the counters
(few hours per scan), what you describe sounds like something caused
by an X issue - there had been several different bugs both in X itself
and in some DE's that made your "screen freeze" (the workaround was to
switch to a different VT and back) while the apps themselvs continue
to run.

Check your dmesg and other logs for any messages containting stuff
such as  link_down, exception Emask , failed command,SError  If
nothing like that exist (and you did run smart scan) you should be ok.

If any message such that exist, it could be either the drive or the cables.

On 9/22/17, Eli Billauer<e...@billauer.co.il>  wrote:
Hello all,

TL;DR: My hard disk's filesystem was corrupt, but the SMART statistics
is perfect. Should I replace the hard disk?

Full version:

It seems like one of my hard disks has passed its own premature Yom
Kippur verdict. Rebooting my computer this morning, it failed to mount,
saying "Group descriptor 32768 checksum is invalid" and forced me into a
shell.

I made the mistake (?) of running fsck and then aborting it with a
(proper CTRL-ALT-DEL) reboot, as it took ages. This is a 3 TB disk,
which isn't necessary for booting, so I removed it from /etc/fstab, and
brought up the computer fine.

Then I ran fsck on that disk, which generated a log of 125 MB, and
basically threw everything into /lost+found, leaving nothing in the root
directory. Hurray.

It's a Western Digital WDC WD30EZRX-00DC0B0, with one big ext4 over LUKS
over LVM, 4 years in service, containing stuff that doesn't deserve a
backup. So the damage is limited, but I wonder if I should replace the
disk.

Despite its age, this disk's SMART status is perfect: No bad sectors, no
reallocated sectors, nothing. No parameter can be better. I know there's
a "don't trust SMART" word around, but had a sector failed, I would
expect that to appear in the statistics. I mean, I do understand that
SMART can't predict a failure, but doesn't it mean anything?

And there's another thing: The reason a rebooted the computer was that I
found the screen frozen, but the mouse pointer moved. The time stood
still at 3:01 (AM). This is highly unusual on my computer, which usually
runs of months with zero issues.

So I connected with ssh, and saw nothing suspicious: Not in
/var/log/messages, not in dmesg, not in .xsession-errors. No process was
busy in particular. From the remote terminal, I couldn't have guessed
something was wrong. So I issued a reboot from remote, which failed as I
mentioned above.

Bottom line: The panic instinct is to replace the disk, even though the
whole computer is due for replacement within a year or so. Money left
aside, it's a bit of an effort, and involves a lot of scary commands as
root, which are a risk factor by themselves. I'm not implying that I'm
stupid enough to mke2fs the wrong disk. Not me. I never err. ;)

Insights are welcome.

Shana Tova,
     Eli

--
Web: http://www.billauer.co.il


_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il



--
Web: http://www.billauer.co.il


_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to