hi,
I just had some massive filesystem corruption on the root filesystem
of a debian potato system that was installed about a month ago and
was working fine till now, running self compiled 2.2.13 kernel. the
box has been running GNU/Linux (redhat) for about a year and a half
with no trouble either..
after finding that all of /bin and nearly all of /etc contents were
moved to lost+found losing their filenames after a fsck -y I ended up
reinstalling the system. is there any way to restore the files to a
destroyed root filesystem (/var /usr and /home are all separate
undamaged filesystems) ? or is a full reinstall the only way to
really fix this other then restore from tape backups (which i do not
have and cannot create)
now after reinstalling the system again and spending several hours
reconfiguring things, running fsck a couple times every so often and
finding nothing wrong, i did some more work and noticed a cron job
ran causing a lot of disk activity (locate database building or
something) so i checked the filesystems again and the root filesystem
was again ruined just like before! this time i had made backups of
/bin /sbin /etc /boot /lib /root and such into tar archives on the
/home partition (/ is the ONLY partition to be damaged, other then
filetype errors that are always there on /var which i have mentioned
before but never got reply on) i ran e2fsck -c and redirected all its
output to a floppy so i could have a reference, after it finished
`fixing' everything by mostly moving everything into lost+found in
the form of inode numbers and restored the tar archives i had made
since everything was ruined anyway, after after restoring most of
them the kernel spat out a few errors about blocks or some such and
the filesystem went totally hosed again, no commands worked any more
since apparently the kernel could not read /sbin and /bin (though i
could get listings using sash's built in ls) rebooting resulted in a
kernel panic no init found.
I am beginning to think that the disk may be going bad but badblocks
e2fsck -c mke2fs -c all report no bad blocks. this is an IDE disk.
is there any other way to find out if the disk is at fault or is this
the kernel? I am running 2.2.13 on a different machine and have been
since it came out and have had no problems with it, except for the
constant filetype errors that show up on the /var filesystem but they
seem to be minor and not hurt anything that i can see.
would the fsck output be useful to anyone more knowledgeable about
the filesystem than I in determining whether this is hardware or
software related? (I can send it to anyone who is interested, if
only in admiring the completness of the ruination. the log file is
about 64000 bytes.)
I have read there is still reports of filesystem corruption in the
stable kernels but it has been mostly unreproducable, if that is what
is happening here it looks like i can reproduce it just fine :|
I tend to think this is not a kernel problem since the 6 other
mounted filesystems are completely undamaged after / gets hosed...
any other pointers on where I should go from here on troubleshooting
this problem?
Ethan