Hi Felix,

Your SMART data looks good to me, except for the hard drive temperature.
Experiencing 53°C looks quite a lot to me. Yet, this should not be the
cause of your corrupted data.

Two data-corruption problems on the same server which looks independant
from each other, and occured at a quite long time range interval from each
other, reminds me of a server who caused me lots of trouble until I
discovered it had memory defects. I suspected hard disk failure and/or hard
drive data corruption, but couldn't nail it with smartctl nor with the
badblocks utility. I eventually nailed the problem when doing extensive
test with the stress utility, showing that in some runs, the memory was
corrupting data (which ended up corrupting data on disk). I had to run the
tests many times to spot the defect. Subtle defects are real hard to spot
on.

IMO, I would advice you to do a full scan of this server to spot where the
problem is in order to file this trail of problems as definitively solved.
In my situation, similar to your one, the problems occured too distantly
from each other to commit resources to investigate thoroughly. This period
of uncertaintly and intuitive distrust of the server caused us a hidden
costs like stress and fatigue. Having experienced it, if that happened
again, I would prefer to rule out this situation quickly instead of knowing
it dormant.

Here are some links which might be relevant to you :
  - https://en.wikipedia.org/wiki/Badblocks
  - https://wiki.archlinux.org/title/Badblocks
  - https://man.archlinux.org/man/stress.1
  - https://wiki.archlinux.org/title/Stress_testing
  - https://www.memtest.org/

Best Regards,
Pierre.

Reply via email to