On Sun, 2005-09-04 at 12:23 -0400, James Cloos wrote: > I got some corruption with a post 2.6.13 kernel (shortly after the jfs > changes where merged).
Is this a Linus kernel, or something different? There has only been one post-2.6.13 change to jfs in Linus' kernel, and that looks pretty harmless. > I didn't notice the oops at first -- except that the: > > mount -n -o remount,rw > > was stuck in text.lock (IIRC; text.something in any case). > > In addition to the remount getting stuck, sync(1) and umount(8) also > got stuck, so I was forced to do an emergency sync/sync/umount/boot. >From the look of the stack trace, the oops must have taken place during the 'mount -n -o remount,ro /' in /etc/init.d/checkroot. It probably left something locked when it oopsed, causing subsequent operations to hang. > I dropped back to a kernel closer to 2.6.13 as released and that is > working fine, AFAICT. > > The oops is below. > > The symptom is mostly in the form of meta-data corruption for anything > that was changed under that kernel. Several binaries ended up with > 0666 rather than 0755 perms, as an example. /etc/ld.so.conf was empty > (but easily recoverable as Gentoo frequently auto-gens it based on > what packages are installed; in fact that frequent auto-generation is > probably why it ended up empty). > > I suspect the emerge process copies the files into the filesystem such > that they are opened with 666 perms and then chmod(2)ed to the perms > they had in the staging install tree, which suggests that the failure > is only with meta-data changes. > > I still have that kernel installed, so I can do some more debugging if > helpful, but only minor stuff as I need to boot with init=/bin/bash to > keep the box usable.... > > ,---- > | [4294746.121000] BUG at fs/jfs/jfs_logmgr.c:1622 > assert(list_empty(&log->cqueue)) > | [4294746.121000] ------------[ cut here ]------------ > | [4294746.121000] kernel BUG at fs/jfs/jfs_logmgr.c:1622! Hmm. For some reason, jfs was unable to write everything the journal. I don't know what could have triggered this in a recent kernel. Is the file system on an ide drive? I don't see any recent changes to ide, or anything else post-2.6.13 that would explain this. I wonder if whatever caused this is a bug in 2.6.13, but it's just not easily reproduced. Did you try it more than once on the later kernel? -- David Kleikamp IBM Linux Technology Center ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Jfs-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jfs-discussion
