Hi,

As said on the call, won't have much time to spend on that for a while,
but there's still a couple of issues with 9P that are easy to reproduce:


- First is a very aggressive fd usage that don't seem to be closed.

That's easy to reproduce, just create a new file with touch, and you'll
see that the setattr2 call from 9p (for touch's utimensat) will open the
file's global fd, which never gets closed.

Possible fix we discussed was opening in the state's context so it'd get
closed when we clear the state, but there's a getattr call right after
the setattr (coming from mdcache's refresh_attr) which does not have a
state and will open the global fd if it hadn't been before...

A short term solution would be to ensure that for 9p we set closefd =
true somehow (hadn't found how for getattr), but it's a bit ugly.

Longer solutions might be to keep a list of all the open fd for a given
file somewhere and have getattr use another state's fd ?
If we do that, we could open the state's fd in the setattr, the
following getattr wouldn't have to reopen, and it'd be closed when we
clunk the fid.
Alternatively we could work harder to keep track of opened global fd and
reclaim/close them faster somehow, with this ganesha would run out of fd
in less than a minute just creating files...

Thoughts?

(I worked around this by forcing closefd to true and making find_fd open
a local fd in find_fd in VFS/file.c; this is what happens on the first
getattr before the handle knows it's a file - basically just bypassing
the switch - but I don't think we want that to land)



- 9p lock problem, didn't bring it up on the call, but cthon lock tests
just fail sometimes...
It fails in cthon's lock test #7, either 7.3 or 7.5, no time spent there
yet but might be related to the refcounts problems so figured I'd bring
it up now I'm summing it up...


- 9p refcount/lock use-after-free (which are just we keep using the
entry after its refcount had been dropped to 0)

Just mount then run cthon lock tests in a loop:
mount -t 9p -o 
aname=export,cache=mmap,privport=1,posixacl,msize=1048576,trans=tcp 10.251.0.1 
/mnt
while date; do /path/to/cthon04/runtests -l -t /mnt/tests; done

Will eventually crash, originally in a clunk just after unlock, but I
removed some double-refs ("pin ref" no longer makes sense now that the
ref are absolute, see https://review.gerrithub.io/347508 ) and it now
crashes directly in unlock, which might make things easier to track.

Always seems to crash in cthon's lock test #14, here's the backtraces
from ASAN:
==4021==ERROR: AddressSanitizer: heap-use-after-free on address 0x61b0000793d0 
at pc 0x0000005f22fa bp 0x7fffe336c8d0 sp 0x7f8
READ of size 8 at 0x61b0000793d0 thread T15
Detaching after fork from child process 4075.
    #0 0x5f22f9 in do_lock_op /export/nfs-ganesha/src/SAL/state_lock.c:2314:13
    #1 0x5fb383 in state_unlock /export/nfs-ganesha/src/SAL/state_lock.c:2937:11
    #2 0x5d469c in _9p_lock /export/nfs-ganesha/src/Protocols/9P/9p_lock.c:182:7
    #3 0x5c926b in _9p_process_buffer 
/export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7
    #4 0x522da4 in _9p_rdma_process_request 
/export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8
    #5 0x475f73 in _9p_execute 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3
    #6 0x475f73 in worker_run 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585
    #7 0x6961e4 in fridgethr_start_routine 
/export/nfs-ganesha/src/support/fridgethr.c:550:3
    #8 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9)
    #9 0x7ffff436af7e in __GI___clone (/lib64/libc.so.6+0x107f7e)

0x61b0000793d0 is located 80 bytes inside of 1480-byte region 
[0x61b000079380,0x61b000079948)
freed by thread T15 here:
    #0 0x7ffff6ea16a0 in __interceptor_cfree 
(/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0xdf6a0)
    #1 0x6ded4d in gsh_free /export/nfs-ganesha/src/include/abstract_mem.h:271:2
    #2 0x6ded4d in pool_free /export/nfs-ganesha/src/include/abstract_mem.h:420
    #3 0x6ded4d in _mdcache_lru_unref 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1496
    #4 0x5fb350 in state_unlock /export/nfs-ganesha/src/SAL/state_lock.c:2920:3
    #5 0x5d469c in _9p_lock /export/nfs-ganesha/src/Protocols/9P/9p_lock.c:182:7
    #6 0x5c926b in _9p_process_buffer 
/export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7
    #7 0x522da4 in _9p_rdma_process_request 
/export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8
    #8 0x475f73 in _9p_execute 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3
    #9 0x475f73 in worker_run 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585
    #10 0x6961e4 in fridgethr_start_routine 
/export/nfs-ganesha/src/support/fridgethr.c:550:3
    #11 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9)

previously allocated by thread T35 here:
    #0 0x7ffff6ea19b0 in calloc 
(/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0xdf9b0)
    #1 0x6dbef3 in gsh_calloc__ 
/export/nfs-ganesha/src/include/abstract_mem.h:145:12
    #2 0x6dbef3 in pool_alloc__ 
/export/nfs-ganesha/src/include/abstract_mem.h:395
    #3 0x6dbef3 in alloc_cache_entry 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1172
    #4 0x6dbef3 in mdcache_lru_get 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1212
    #5 0x6f55ff in mdcache_alloc_handle 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:117:2
    #6 0x6f55ff in mdcache_new_entry 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:412
    #7 0x6e2d5d in mdcache_alloc_and_check_handle 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1
    #8 0x6f0a9f in mdcache_open2 
/export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:702:11
    #9 0x440098 in open2_by_name 
/export/nfs-ganesha/src/FSAL/fsal_helper.c:399:11
    #10 0x44881b in fsal_open2 
/export/nfs-ganesha/src/FSAL/fsal_helper.c:1808:10
    #11 0x5d15b1 in _9p_lcreate 
/export/nfs-ganesha/src/Protocols/9P/9p_lcreate.c:134:17
    #12 0x5c926b in _9p_process_buffer 
/export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7
    #13 0x522da4 in _9p_rdma_process_request 
/export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8
    #14 0x475f73 in _9p_execute 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3
    #15 0x475f73 in worker_run 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585
    #16 0x6961e4 in fridgethr_start_routine 
/export/nfs-ganesha/src/support/fridgethr.c:550:3
    #17 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9)

Thread T15 created by T0 here:
    #0 0x7ffff6e16009 in pthread_create 
(/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0x54009)
    #1 0x695506 in fridgethr_populate 
/export/nfs-ganesha/src/support/fridgethr.c:1418:8
    #2 0x475724 in worker_init 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1642:7
    #3 0x514dca in nfs_Start_threads 
/export/nfs-ganesha/src/MainNFSD/nfs_init.c:457:7
    #4 0x514dca in nfs_start /export/nfs-ganesha/src/MainNFSD/nfs_init.c:879
    #5 0x41daac in main /export/nfs-ganesha/src/MainNFSD/nfs_main.c:479:2
    #6 0x7ffff4283400 in __libc_start_main (/lib64/libc.so.6+0x20400)

Thread T35 created by T0 here:
    #0 0x7ffff6e16009 in pthread_create 
(/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0x54009)
    #1 0x695506 in fridgethr_populate 
/export/nfs-ganesha/src/support/fridgethr.c:1418:8
    #2 0x475724 in worker_init 
/export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1642:7
    #3 0x514dca in nfs_Start_threads 
/export/nfs-ganesha/src/MainNFSD/nfs_init.c:457:7
    #4 0x514dca in nfs_start /export/nfs-ganesha/src/MainNFSD/nfs_init.c:879
    #5 0x41daac in main /export/nfs-ganesha/src/MainNFSD/nfs_main.c:479:2
    #6 0x7ffff4283400 in __libc_start_main (/lib64/libc.so.6+0x20400)

(crashes in ~5mins for me, I could provide full logs but it might be
easier to reproduce given we don't log refs counting at all)



Thanks for any time spent in that, it'd be niceā„¢ to have this fixed
before we release 2.5.
We've got other problems here so time is scarce, but if the need be I'll
find some time again around cthon's dates.

Cheers,
-- 
Dominique

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to