Sorry it's taken me so long to look into this, and thanks for all your
help with this bug.
On Sun, 2004-12-19 at 14:10 -0500, Sonny Rao wrote:
> Hmm, okay I had thought it might be caused by a lack of available
> transaction blocks or locks, but that doesn't seem to be the case
> based on the data you provided.
Actually, we do appear to be out of transaction locks (tlocks).
jfsSync D C0348640 0 1803 1 2174 1802 (L-TLB)
de94bdd8 00000046 dee198b0 c0348640 def1d72c 00000008 c01f1c3a def1d72c
d808f600 000f424b dbe060f0 00000000 12707480 000f424b dee19a58
de94a000
de94bde8 c0118714 de94be08 e097fe87 00000000 00000000 00000000
dee198b0
Call Trace:
[<c01f1c3a>] generic_make_request+0x15e/0x1de
[<c0118714>] default_wake_function+0x0/0x12
[<e097fe87>] txLockAlloc+0xa7/0x160 [jfs]
[<c0118714>] default_wake_function+0x0/0x12
[<c0118714>] default_wake_function+0x0/0x12
[<e0980bb5>] txLock+0x275/0x470 [jfs]
[<e097f5b1>] lbmStartIO+0xb1/0xe0 [jfs]
[<e097f46f>] lbmWrite+0x10f/0x150 [jfs]
[<e097c99e>] __get_metapage+0x5e/0x3d0 [jfs]
[<e097dcbd>] lmGCwrite+0xdd/0xf0 [jfs]
[<e096ce50>] diWrite+0x190/0x5f0 [jfs]
[<e0981473>] txCommit+0x1b3/0x320 [jfs]
[<e098083b>] txEnd+0x3b/0x140 [jfs]
[<e098351f>] jfs_sync+0x21f/0x2c0 [jfs]
[<c0118714>] default_wake_function+0x0/0x12
[<c0105d26>] ret_from_fork+0x6/0x14
[<c0118714>] default_wake_function+0x0/0x12
[<e0983300>] jfs_sync+0x0/0x2c0 [jfs]
[<c010425d>] kernel_thread_helper+0x5/0xb
This is the thread that is trying to free some tlocks when they start
getting low. I think generic_make_request is just stack noise. We
really don't want txLockAlloc to block here. We try really hard not to
exhaust all of the tlocks so that this thread can make some progress.
I think the real problem though is the jfsCommit thread. If it were not
blocked, I suspect that there would be a lot of tlocks freed.
jfsCommit D C0348640 0 1802 1 1803 1801 (L-TLB)
de429ee0 00000046 dee18800 c0348640 df7dc000 dec76234 00004000 e09357e4
de428000 00270284 dbe060f0 00000000 f6e6e500 000f424a dee189a8
dec7607c
de429ef4 dee18800 dec76234 c026e3cd deeaa000 00001bcc dec76080
dec76080
Call Trace:
[<c026e3cd>] rwsem_down_read_failed+0x8f/0x17c
[<e097085d>] .text.lock.jfs_imap+0x3ff/0x422 [jfs]
[<e09824de>] txUpdateMap+0xae/0x250 [jfs]
[<e09808d7>] txEnd+0xd7/0x140 [jfs]
[<e0982d00>] txLazyCommit+0x20/0xe0 [jfs]
[<e0982f78>] jfs_lazycommit+0x1b8/0x1e0 [jfs]
[<c0118714>] default_wake_function+0x0/0x12
[<c0105d26>] ret_from_fork+0x6/0x14
[<c0118714>] default_wake_function+0x0/0x12
[<e0982dc0>] jfs_lazycommit+0x0/0x1e0 [jfs]
[<c010425d>] kernel_thread_helper+0x5/0xb
This thread must be in diUpdatePMap, trying to acquire ipimap's
rdwrlock: IREAD_LOCK(). The funny thing is that the only place a write
lock is taken on this inode is in diNewIAG, which is only called under
diAlloc, which I don't see in any stacks.
I can't find any error path that would leave the lock taken, so I can't
account for why this thread would be blocked. I'm sure you would have
noticed if a thread oopsed in this path, leaving the lock locked.
> I am able to reproduce the problem on a laptop I have with me as well,
> I'm running a 2.6.8.1 kernel.
>
> It looks like all of the processes are stuck waiting on a semaphore
> somewhere in namei (.text.lock.namei) and upon reboot i have a bit of
> filesystem corruption.
I'd be interested in any info from jfs_fsck as to the nature of the
filesystem corruption.
> Shaggy will have to look into this some more, thanks for the report.
Thanks for all your help.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
_______________________________________________
Jfs-discussion mailing list
[email protected]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion