I've seen this happen 3 times during 4.10rc.

When running trinity, it gets 'stuck', with all but one process
stuck on a lock.  I've left this running for days, and it never makes
progress.  The process holding the lock seems to be stuck somewhere.

When this happens it's pretty apparent in ps axf output.

10129 pts/1    SL+    0:36     \_ ../trinity -C64 -D -q -l off --dropprivs -N 
1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl --dr
10165 ?        DNs    0:03         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
10186 ?        DNs    0:01         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
10187 ?        DNs    0:02         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
12533 ?        DNs    0:01         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
14334 ?        DNs    0:01         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
14356 ?        Ds     0:01         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
14532 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
15149 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
15214 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
15588 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
15657 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
15850 pts/1    DN     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
16092 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
16439 pts/1    DN     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
16772 ?        RNs  2498:28         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl
17033 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17420 ?        Ds     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17438 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17535 ?        Ds     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17690 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17711 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17730 ?        Ds     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17760 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17863 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
17990 ?        DNs    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
18387 pts/1    DL+    0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
18500 pts/1    D+     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
18625 ?        Ds     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
18748 ?        Ds     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl 
18807 pts/1    D+     0:00         \_ ../trinity -C64 -D -q -l off --dropprivs 
-N 1000000 -a64 --disable-fds=perf --enable-fds=testfile -x ioctl

(Note the 2498:28 runtime of the only running child task).

As mentioned, all the blocked children are sleeping..

18807
[<ffffffff813dd9c7>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffffa0094beb>] btrfs_fallocate+0xcb/0x11d0 [btrfs]
[<ffffffff81245453>] vfs_fallocate+0x143/0x230
[<ffffffff81246358>] SyS_fallocate+0x48/0x80
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

18748
[<ffffffff813dd9c7>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffffa0096d02>] btrfs_sync_file+0x162/0x4c0 [btrfs]
[<ffffffff81283d8b>] vfs_fsync_range+0x4b/0xb0
[<ffffffff81283e4d>] do_fsync+0x3d/0x70
[<ffffffff81284100>] SyS_fsync+0x10/0x20
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

18625
[<ffffffff8126bdf9>] __fdget_pos+0x49/0x50
[<ffffffff8124a7ce>] SyS_write+0x2e/0xc0
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

18500
[<ffffffff813dd9c7>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffffa00970d8>] btrfs_file_write_iter+0x78/0x570 [btrfs]
[<ffffffff81248888>] do_iter_readv_writev+0xb8/0x120
[<ffffffff81249594>] do_readv_writev+0x1a4/0x250
[<ffffffff8124989f>] vfs_writev+0x3f/0x50
[<ffffffff81249a65>] do_pwritev+0xb5/0xd0
[<ffffffff8124ac37>] SyS_pwritev2+0x17/0x30
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

18387
[<ffffffff813dd9c7>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffffa0096d02>] btrfs_sync_file+0x162/0x4c0 [btrfs]
[<ffffffff81283d8b>] vfs_fsync_range+0x4b/0xb0
[<ffffffff81283e4d>] do_fsync+0x3d/0x70
[<ffffffff81284100>] SyS_fsync+0x10/0x20
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

17990
[<ffffffff8126bdf9>] __fdget_pos+0x49/0x50
[<ffffffff81248d5d>] SyS_lseek+0x1d/0xb0
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

17863
[<ffffffff813dd9c7>] call_rwsem_down_write_failed+0x17/0x30
[<ffffffff812456ca>] chmod_common+0x9a/0x150
[<ffffffff81246afa>] SyS_fchmod+0x3a/0x70
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

17760
[<ffffffff8126bdf9>] __fdget_pos+0x49/0x50
[<ffffffff812498e3>] do_writev+0x33/0x100
[<ffffffff8124aba0>] SyS_writev+0x10/0x20
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

17730
[<ffffffff8126bdf9>] __fdget_pos+0x49/0x50
[<ffffffff8124a7ce>] SyS_write+0x2e/0xc0
[<ffffffff81002d81>] do_syscall_64+0x61/0x170
[<ffffffff818cb3cb>] entry_SYSCALL64_slow_path+0x25/0x25
[<ffffffffffffffff>] 0xffffffffffffffff

etc etc.



Function graph trace of the running pid :
http://codemonkey.org.uk/junk/btrfs-stuck3.txt
(There are also stuck1 and stuck2 in that dir from other runs if they're 
interesting)


Also of note, is that the running task isn't killable.

iotop shows that there's almost no IO going on at all.

The syscall that the running child is trying to do is pretty simple..

fallocate(fd=7, mode=0x3, offset=114, len=9440) 

perhaps interesting, is that the file backing that fd has been unlinked.

# ll /proc/16772/fd/7
lrwx------ 1 root root 64 Jan 13 11:21 /proc/16772/fd/7 -> 
/mnt/ssd/trinity/tmp/trinity.9kTIQN/tmp/trinity-testfile4 (deleted)


        Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to