On Sat, Dec 18, 2004 at 02:49:38PM -0600, Jon Nelson wrote:
>
>
> I should note that if I crack open another terminal and strace the find,
> this is what I get:
>
>
> open(".", O_RDONLY|O_LARGEFILE) = 5
> fchdir(5) = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> chdir(".") = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> lstat64(".", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 5), ...}) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x40252000
> write(1, ".\n", 2.
> ) = 2
> open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 6
> fstat64(6, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
> fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
> getdents(6,
> ^^^^^ hangs here.
Yeah, there appears to be a deadlock somewhere in the directory
handling code. I find that it's not always reproducible on my laptop
depending on how much memory is free, other stuff going on, etc. But
I was able to trigger it once by simply running fstest and a find
concurrently, then killing fstest and running ls on the directory
where fstest was running.
Here's what the backtraces look like:
fstest D C0348640 0 5093 1 5094 5092 (NOTLB)
f07f9f14 00000082 f07f7870 c0348640 fffffff5 f13a992c c015eeca f13a992c
00000001 f07f9f70 f242ad10 00000000 23063c80 000f42cd f07f7a18 f13a9994
00000292 f07f8000 f07f7870 c026d7d7 f13a999c 00000001 f07f7870 c0118714
Call Trace:
[<c015eeca>] link_path_walk+0xccf/0xdb2
[<c026d7d7>] __down+0x8b/0xfd
[<c0118714>] default_wake_function+0x0/0x12
[<c026d984>] __down_failed+0x8/0xc
[<c0162261>] .text.lock.namei+0x109/0x168
[<c01068f1>] error_code+0x2d/0x38
[<c01181fd>] schedule_tail+0x41/0x4d
[<c0105dfd>] sysenter_past_esp+0x52/0x71
fstest D C0348640 0 5094 1 5095 5093 (NOTLB)
f07fbf14 00000082 f07f72e0 c0348640 fffffff5 f13a992c c015eeca f13a992c
00000001 f07fbf70 f242ad10 00000000 23063c80 000f42cd f07f7488 f13a9994
00000292 f07fa000 f07f72e0 c026d7d7 f13a999c 00000001 f07f72e0 c0118714
Call Trace:
[<c015eeca>] link_path_walk+0xccf/0xdb2
[<c026d7d7>] __down+0x8b/0xfd
[<c0118714>] default_wake_function+0x0/0x12
[<c026d984>] __down_failed+0x8/0xc
[<c0162261>] .text.lock.namei+0x109/0x168
[<c01068f1>] error_code+0x2d/0x38
[<c01181fd>] schedule_tail+0x41/0x4d
[<c0105dfd>] sysenter_past_esp+0x52/0x71
<snip>
find D C0348AE8 0 5115 4192 (NOTLB)
f2233f24 00000082 f1526330 c0348ae8 000f42ce 0000fe04 00000000 00000000
0744ab6b 000f42ce f1526330 000f4240 074f6600 000f42ce f1526ff8 f13a9994
00000292 f2232000 f1526e50 c026d7d7 f13a999c 00000001 f1526e50 c0118714
Call Trace:
[<c026d7d7>] __down+0x8b/0xfd
[<c0118714>] default_wake_function+0x0/0x12
[<c026d984>] __down_failed+0x8/0xc
[<c016393f>] .text.lock.readdir+0x5/0x16
[<c0163900>] sys_getdents64+0x71/0xab
[<c01637a4>] filldir64+0x0/0xeb
[<c0105dfd>] sysenter_past_esp+0x52/0x71
ls D C0348640 0 5117 4191 (NOTLB)
f0581f24 00000082 f15268c0 c0348640 00000000 f15bf354 f15af400 00000000
f14b8140 f14b8160 f747c1b0 000f4240 70291b00 000f42d1 f1526a68 f13a9994
00000292 f0580000 f15268c0 c026d7d7 f13a999c 00000001 f15268c0 c0118714
Call Trace:
[<c026d7d7>] __down+0x8b/0xfd
[<c0118714>] default_wake_function+0x0/0x12
[<c026d984>] __down_failed+0x8/0xc
[<c016393f>] .text.lock.readdir+0x5/0x16
[<c0163900>] sys_getdents64+0x71/0xab
[<c01637a4>] filldir64+0x0/0xeb
[<c0105dfd>] sysenter_past_esp+0x52/0x71
So someone is holding the readdir semaphore here, but I can't tell
which thread. find and ls are blocking on that. Also the jfsCommit
and jfsSync threads are also blocking on something, but I haven't been
able to get the backtraces for those yet.
All of the fstest threads seem to be stuck in some
error path.
I'll follow up if I find more.
Sonny
_______________________________________________
Jfs-discussion mailing list
[EMAIL PROTECTED]
http://www-124.ibm.com/developerworks/oss/mailman/listinfo/jfs-discussion