On Sat, 2005-10-15 at 00:53 -0400, Daniel Savard wrote:
> I am running many Gentoo systems and I ran into some problems which
> seems related to the filesystem.
> 
> In my case, I am running kernel 2.6.12 or 2.6.13 on machines I had
> problem with. I was never able to identify the source of my problem.
> 
> All my filesystems are jfs, except /boot. I am also using LVM, except
> for /.
> 
> The problem is as follow, it seems the filesystem get corrupted when
> activity is high on it. I had twice the following problem on two
> different machines, my /etc was entirely moved to lost+found and in one
> case I was not able to complete the recovering from lost+found, I hit a
> corrupted file and when trying to cat the file it hanged and I was no
> longer able to ls in the lost+found directory.

There was a data corruption problem on the 2.6.12 kernel in the
device-mapper code that affected lvm volumes:
http://bugzilla.kernel.org/show_bug.cgi?id=4946

I would assume that the corruption might have affected jfs's metadata as
well as regular data.  It was fixed in 2.6.12.4 and 2.6.13.

> Right now, I have a machine I just cannot shutdown. All the commands
> looking at files in /root and some other directories are hanging. Since
> I am not on-site, I cannot power off/power on and run fsck. And before
> doing this, I would like to know if there is something I can do to help
> identify the source of the problem. This server was having many fs very
> actives when I hit that problem. It is running squid, emerge
> http-replicator and it was running an emerge update. The disk subsystem
> is RAID 5.
> 
> Anything I can do to determine the problem and troubleshoot my server?

First, I'd run dmesg and look for anything suspicious.  Then type
"echo t > /proc/sysrq-trigger", then
"dmesg -s 1000000 > /somewhere/you/can/still/write/to"

It never hurts to do "echo s > /proc/sysrq-trigger" if the system hangs
during shutdown.  (This syncs the disks asynchronously.)  After that
"echo b > /proc/sysrq-trigger" will probably let you reboot remotely.
I'm assuming you can still get to a shell.

> I must admit I am a little bit discouraged with jfs, this kind of
> problems are time consuming and I am about to decide to switch to
> another fs.

I think the dm bug may have been a problem for you in the 2.6.12 kernel
(fixed in 2.6.12.4), and there was a bit of a nasty jfs bug in 2.6.13
that was fixed in 2.6.13.2.  (If you are on big-endian hardware, I've
got another patch on top of that.)

If you're running the vanilla kernel, I see that
vanilla-sources-2.6.13.2 is masked.  You may have better luck with that
kernel.

I haven't had any problems running jfs on gentoo in the latest 2.6.14-rc
kernels, but I'm not running on lvm.

> TIA,
> 
> Daniel
-- 
David Kleikamp
IBM Linux Technology Center



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to