On Sat, 2005-10-15 at 00:53 -0400, Daniel Savard wrote: > I am running many Gentoo systems and I ran into some problems which > seems related to the filesystem. > > In my case, I am running kernel 2.6.12 or 2.6.13 on machines I had > problem with. I was never able to identify the source of my problem. > > All my filesystems are jfs, except /boot. I am also using LVM, except > for /. > > The problem is as follow, it seems the filesystem get corrupted when > activity is high on it. I had twice the following problem on two > different machines, my /etc was entirely moved to lost+found and in one > case I was not able to complete the recovering from lost+found, I hit a > corrupted file and when trying to cat the file it hanged and I was no > longer able to ls in the lost+found directory.
There was a data corruption problem on the 2.6.12 kernel in the device-mapper code that affected lvm volumes: http://bugzilla.kernel.org/show_bug.cgi?id=4946 I would assume that the corruption might have affected jfs's metadata as well as regular data. It was fixed in 2.6.12.4 and 2.6.13. > Right now, I have a machine I just cannot shutdown. All the commands > looking at files in /root and some other directories are hanging. Since > I am not on-site, I cannot power off/power on and run fsck. And before > doing this, I would like to know if there is something I can do to help > identify the source of the problem. This server was having many fs very > actives when I hit that problem. It is running squid, emerge > http-replicator and it was running an emerge update. The disk subsystem > is RAID 5. > > Anything I can do to determine the problem and troubleshoot my server? First, I'd run dmesg and look for anything suspicious. Then type "echo t > /proc/sysrq-trigger", then "dmesg -s 1000000 > /somewhere/you/can/still/write/to" It never hurts to do "echo s > /proc/sysrq-trigger" if the system hangs during shutdown. (This syncs the disks asynchronously.) After that "echo b > /proc/sysrq-trigger" will probably let you reboot remotely. I'm assuming you can still get to a shell. > I must admit I am a little bit discouraged with jfs, this kind of > problems are time consuming and I am about to decide to switch to > another fs. I think the dm bug may have been a problem for you in the 2.6.12 kernel (fixed in 2.6.12.4), and there was a bit of a nasty jfs bug in 2.6.13 that was fixed in 2.6.13.2. (If you are on big-endian hardware, I've got another patch on top of that.) If you're running the vanilla kernel, I see that vanilla-sources-2.6.13.2 is masked. You may have better luck with that kernel. I haven't had any problems running jfs on gentoo in the latest 2.6.14-rc kernels, but I'm not running on lvm. > TIA, > > Daniel -- David Kleikamp IBM Linux Technology Center ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Jfs-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jfs-discussion
