On Sat, Dec 18, 2004 at 02:49:38PM -0600, Jon Nelson wrote:
> 
> 
> I should note that if I crack open another terminal and strace the find, 
> this is what I get:
> 
> 
> open(".", O_RDONLY|O_LARGEFILE)         = 5
> fchdir(5)                               = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> chdir(".")                              = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
> 0) = 0x40252000
> write(1, ".\n", 2.
> )                      = 2
> open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 6
> fstat64(6, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
> getdents(6,
> ^^^^^ hangs here.

Yeah, there appears to be a deadlock somewhere in the directory
handling code.  I find that it's not always reproducible on my laptop
depending on how much memory is free, other stuff going on, etc.  But
I was able to trigger it once by simply running fstest and a find
concurrently, then killing fstest and running ls on the directory
where fstest was running. 

Here's what the backtraces look like:

fstest        D C0348640     0  5093      1          5094  5092 (NOTLB)
f07f9f14 00000082 f07f7870 c0348640 fffffff5 f13a992c c015eeca f13a992c 
       00000001 f07f9f70 f242ad10 00000000 23063c80 000f42cd f07f7a18 f13a9994 
       00000292 f07f8000 f07f7870 c026d7d7 f13a999c 00000001 f07f7870 c0118714 
Call Trace:
 [<c015eeca>] link_path_walk+0xccf/0xdb2
 [<c026d7d7>] __down+0x8b/0xfd
 [<c0118714>] default_wake_function+0x0/0x12
 [<c026d984>] __down_failed+0x8/0xc
 [<c0162261>] .text.lock.namei+0x109/0x168
 [<c01068f1>] error_code+0x2d/0x38
 [<c01181fd>] schedule_tail+0x41/0x4d
 [<c0105dfd>] sysenter_past_esp+0x52/0x71
fstest        D C0348640     0  5094      1          5095  5093 (NOTLB)
f07fbf14 00000082 f07f72e0 c0348640 fffffff5 f13a992c c015eeca f13a992c 
       00000001 f07fbf70 f242ad10 00000000 23063c80 000f42cd f07f7488 f13a9994 
       00000292 f07fa000 f07f72e0 c026d7d7 f13a999c 00000001 f07f72e0 c0118714 
Call Trace:
 [<c015eeca>] link_path_walk+0xccf/0xdb2
 [<c026d7d7>] __down+0x8b/0xfd
 [<c0118714>] default_wake_function+0x0/0x12
 [<c026d984>] __down_failed+0x8/0xc
 [<c0162261>] .text.lock.namei+0x109/0x168
 [<c01068f1>] error_code+0x2d/0x38
 [<c01181fd>] schedule_tail+0x41/0x4d
 [<c0105dfd>] sysenter_past_esp+0x52/0x71

<snip>
find          D C0348AE8     0  5115   4192                     (NOTLB)
f2233f24 00000082 f1526330 c0348ae8 000f42ce 0000fe04 00000000 00000000 
       0744ab6b 000f42ce f1526330 000f4240 074f6600 000f42ce f1526ff8 f13a9994 
       00000292 f2232000 f1526e50 c026d7d7 f13a999c 00000001 f1526e50 c0118714 
Call Trace:
 [<c026d7d7>] __down+0x8b/0xfd
 [<c0118714>] default_wake_function+0x0/0x12
 [<c026d984>] __down_failed+0x8/0xc
 [<c016393f>] .text.lock.readdir+0x5/0x16
 [<c0163900>] sys_getdents64+0x71/0xab
 [<c01637a4>] filldir64+0x0/0xeb
 [<c0105dfd>] sysenter_past_esp+0x52/0x71
ls            D C0348640     0  5117   4191                     (NOTLB)
f0581f24 00000082 f15268c0 c0348640 00000000 f15bf354 f15af400 00000000 
       f14b8140 f14b8160 f747c1b0 000f4240 70291b00 000f42d1 f1526a68 f13a9994 
       00000292 f0580000 f15268c0 c026d7d7 f13a999c 00000001 f15268c0 c0118714 
Call Trace:
 [<c026d7d7>] __down+0x8b/0xfd
 [<c0118714>] default_wake_function+0x0/0x12
 [<c026d984>] __down_failed+0x8/0xc
 [<c016393f>] .text.lock.readdir+0x5/0x16
 [<c0163900>] sys_getdents64+0x71/0xab
 [<c01637a4>] filldir64+0x0/0xeb
 [<c0105dfd>] sysenter_past_esp+0x52/0x71


So someone is holding the readdir semaphore here, but I can't tell
which thread.  find and ls are blocking on that.  Also the jfsCommit
and jfsSync threads are also blocking on something, but I haven't been
able to get the backtraces for those yet.  

All of the fstest threads seem to be stuck in some
error path.

I'll follow up if I find more.

Sonny
_______________________________________________
Jfs-discussion mailing list
[EMAIL PROTECTED]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion

Reply via email to