On Sun, 2005-09-04 at 12:23 -0400, James Cloos wrote:
> I got some corruption with a post 2.6.13 kernel (shortly after the jfs
> changes where merged).

Is this a Linus kernel, or something different?  There has only been one
post-2.6.13 change to jfs in Linus' kernel, and that looks pretty
harmless.

> I didn't notice the oops at first -- except that the:
> 
>   mount -n -o remount,rw
> 
> was stuck in text.lock (IIRC; text.something in any case).
> 
> In addition to the remount getting stuck, sync(1) and umount(8) also
> got stuck, so I was forced to do an emergency sync/sync/umount/boot.

>From the look of the stack trace, the oops must have taken place during
the 'mount -n -o remount,ro /' in /etc/init.d/checkroot.  It probably
left something locked when it oopsed, causing subsequent operations to
hang.

> I dropped back to a kernel closer to 2.6.13 as released and that is
> working fine, AFAICT.

> 
> The oops is below.
> 
> The symptom is mostly in the form of meta-data corruption for anything
> that was changed under that kernel.  Several binaries ended up with
> 0666 rather than 0755 perms, as an example.  /etc/ld.so.conf was empty
> (but easily recoverable as Gentoo frequently auto-gens it based on
> what packages are installed; in fact that frequent auto-generation is
> probably why it ended up empty).  
> 
> I suspect the emerge process copies the files into the filesystem such
> that they are opened with 666 perms and then chmod(2)ed to the perms
> they had in the staging install tree, which suggests that the failure
> is only with meta-data changes.
> 
> I still have that kernel installed, so I can do some more debugging if
> helpful, but only minor stuff as I need to boot with init=/bin/bash to
> keep the box usable....
> 
> ,----
> | [4294746.121000] BUG at fs/jfs/jfs_logmgr.c:1622 
> assert(list_empty(&log->cqueue))
> | [4294746.121000] ------------[ cut here ]------------
> | [4294746.121000] kernel BUG at fs/jfs/jfs_logmgr.c:1622!

Hmm.  For some reason, jfs was unable to write everything the journal.
I don't know what could have triggered this in a recent kernel.  Is the
file system on an ide drive?  I don't see any recent changes to ide, or
anything else post-2.6.13 that would explain this.

I wonder if whatever caused this is a bug in 2.6.13, but it's just not
easily reproduced.  Did you try it more than once on the later kernel?
-- 
David Kleikamp
IBM Linux Technology Center



-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to