Hi, As said on the call, won't have much time to spend on that for a while, but there's still a couple of issues with 9P that are easy to reproduce:
- First is a very aggressive fd usage that don't seem to be closed. That's easy to reproduce, just create a new file with touch, and you'll see that the setattr2 call from 9p (for touch's utimensat) will open the file's global fd, which never gets closed. Possible fix we discussed was opening in the state's context so it'd get closed when we clear the state, but there's a getattr call right after the setattr (coming from mdcache's refresh_attr) which does not have a state and will open the global fd if it hadn't been before... A short term solution would be to ensure that for 9p we set closefd = true somehow (hadn't found how for getattr), but it's a bit ugly. Longer solutions might be to keep a list of all the open fd for a given file somewhere and have getattr use another state's fd ? If we do that, we could open the state's fd in the setattr, the following getattr wouldn't have to reopen, and it'd be closed when we clunk the fid. Alternatively we could work harder to keep track of opened global fd and reclaim/close them faster somehow, with this ganesha would run out of fd in less than a minute just creating files... Thoughts? (I worked around this by forcing closefd to true and making find_fd open a local fd in find_fd in VFS/file.c; this is what happens on the first getattr before the handle knows it's a file - basically just bypassing the switch - but I don't think we want that to land) - 9p lock problem, didn't bring it up on the call, but cthon lock tests just fail sometimes... It fails in cthon's lock test #7, either 7.3 or 7.5, no time spent there yet but might be related to the refcounts problems so figured I'd bring it up now I'm summing it up... - 9p refcount/lock use-after-free (which are just we keep using the entry after its refcount had been dropped to 0) Just mount then run cthon lock tests in a loop: mount -t 9p -o aname=export,cache=mmap,privport=1,posixacl,msize=1048576,trans=tcp 10.251.0.1 /mnt while date; do /path/to/cthon04/runtests -l -t /mnt/tests; done Will eventually crash, originally in a clunk just after unlock, but I removed some double-refs ("pin ref" no longer makes sense now that the ref are absolute, see https://review.gerrithub.io/347508 ) and it now crashes directly in unlock, which might make things easier to track. Always seems to crash in cthon's lock test #14, here's the backtraces from ASAN: ==4021==ERROR: AddressSanitizer: heap-use-after-free on address 0x61b0000793d0 at pc 0x0000005f22fa bp 0x7fffe336c8d0 sp 0x7f8 READ of size 8 at 0x61b0000793d0 thread T15 Detaching after fork from child process 4075. #0 0x5f22f9 in do_lock_op /export/nfs-ganesha/src/SAL/state_lock.c:2314:13 #1 0x5fb383 in state_unlock /export/nfs-ganesha/src/SAL/state_lock.c:2937:11 #2 0x5d469c in _9p_lock /export/nfs-ganesha/src/Protocols/9P/9p_lock.c:182:7 #3 0x5c926b in _9p_process_buffer /export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7 #4 0x522da4 in _9p_rdma_process_request /export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8 #5 0x475f73 in _9p_execute /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3 #6 0x475f73 in worker_run /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585 #7 0x6961e4 in fridgethr_start_routine /export/nfs-ganesha/src/support/fridgethr.c:550:3 #8 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9) #9 0x7ffff436af7e in __GI___clone (/lib64/libc.so.6+0x107f7e) 0x61b0000793d0 is located 80 bytes inside of 1480-byte region [0x61b000079380,0x61b000079948) freed by thread T15 here: #0 0x7ffff6ea16a0 in __interceptor_cfree (/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0xdf6a0) #1 0x6ded4d in gsh_free /export/nfs-ganesha/src/include/abstract_mem.h:271:2 #2 0x6ded4d in pool_free /export/nfs-ganesha/src/include/abstract_mem.h:420 #3 0x6ded4d in _mdcache_lru_unref /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1496 #4 0x5fb350 in state_unlock /export/nfs-ganesha/src/SAL/state_lock.c:2920:3 #5 0x5d469c in _9p_lock /export/nfs-ganesha/src/Protocols/9P/9p_lock.c:182:7 #6 0x5c926b in _9p_process_buffer /export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7 #7 0x522da4 in _9p_rdma_process_request /export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8 #8 0x475f73 in _9p_execute /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3 #9 0x475f73 in worker_run /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585 #10 0x6961e4 in fridgethr_start_routine /export/nfs-ganesha/src/support/fridgethr.c:550:3 #11 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9) previously allocated by thread T35 here: #0 0x7ffff6ea19b0 in calloc (/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0xdf9b0) #1 0x6dbef3 in gsh_calloc__ /export/nfs-ganesha/src/include/abstract_mem.h:145:12 #2 0x6dbef3 in pool_alloc__ /export/nfs-ganesha/src/include/abstract_mem.h:395 #3 0x6dbef3 in alloc_cache_entry /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1172 #4 0x6dbef3 in mdcache_lru_get /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c:1212 #5 0x6f55ff in mdcache_alloc_handle /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:117:2 #6 0x6f55ff in mdcache_new_entry /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_helpers.c:412 #7 0x6e2d5d in mdcache_alloc_and_check_handle /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1 #8 0x6f0a9f in mdcache_open2 /export/nfs-ganesha/src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:702:11 #9 0x440098 in open2_by_name /export/nfs-ganesha/src/FSAL/fsal_helper.c:399:11 #10 0x44881b in fsal_open2 /export/nfs-ganesha/src/FSAL/fsal_helper.c:1808:10 #11 0x5d15b1 in _9p_lcreate /export/nfs-ganesha/src/Protocols/9P/9p_lcreate.c:134:17 #12 0x5c926b in _9p_process_buffer /export/nfs-ganesha/src/Protocols/9P/9p_interpreter.c:180:7 #13 0x522da4 in _9p_rdma_process_request /export/nfs-ganesha/src/MainNFSD/9p_rdma_callbacks.c:158:8 #14 0x475f73 in _9p_execute /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1474:3 #15 0x475f73 in worker_run /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1585 #16 0x6961e4 in fridgethr_start_routine /export/nfs-ganesha/src/support/fridgethr.c:550:3 #17 0x7ffff6bab6c9 in start_thread (/lib64/libpthread.so.0+0x76c9) Thread T15 created by T0 here: #0 0x7ffff6e16009 in pthread_create (/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0x54009) #1 0x695506 in fridgethr_populate /export/nfs-ganesha/src/support/fridgethr.c:1418:8 #2 0x475724 in worker_init /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1642:7 #3 0x514dca in nfs_Start_threads /export/nfs-ganesha/src/MainNFSD/nfs_init.c:457:7 #4 0x514dca in nfs_start /export/nfs-ganesha/src/MainNFSD/nfs_init.c:879 #5 0x41daac in main /export/nfs-ganesha/src/MainNFSD/nfs_main.c:479:2 #6 0x7ffff4283400 in __libc_start_main (/lib64/libc.so.6+0x20400) Thread T35 created by T0 here: #0 0x7ffff6e16009 in pthread_create (/usr/lib64/clang/3.8.1/lib/libclang_rt.asan-x86_64.so+0x54009) #1 0x695506 in fridgethr_populate /export/nfs-ganesha/src/support/fridgethr.c:1418:8 #2 0x475724 in worker_init /export/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1642:7 #3 0x514dca in nfs_Start_threads /export/nfs-ganesha/src/MainNFSD/nfs_init.c:457:7 #4 0x514dca in nfs_start /export/nfs-ganesha/src/MainNFSD/nfs_init.c:879 #5 0x41daac in main /export/nfs-ganesha/src/MainNFSD/nfs_main.c:479:2 #6 0x7ffff4283400 in __libc_start_main (/lib64/libc.so.6+0x20400) (crashes in ~5mins for me, I could provide full logs but it might be easier to reproduce given we don't log refs counting at all) Thanks for any time spent in that, it'd be nice⢠to have this fixed before we release 2.5. We've got other problems here so time is scarce, but if the need be I'll find some time again around cthon's dates. Cheers, -- Dominique ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel