I will certainly pick up the fix and try it asap, thanks for pointing it out.
Regards. Krishna Harathi On Fri, Jun 19, 2015 at 11:50 AM, Omkar Joshi <o...@hedviginc.com> wrote: > check cache inode related bug which I had raised on red hat bugzilla. It > may be related to that... I have posted the fix for this. > > On Thu, Jun 18, 2015 at 8:15 AM, Krishna Harathi <khara...@exablox.com> > wrote: > >> Hi Malahal, >> >> We are using VFS FSAL. >> >> In the original email, I noted that the parent cache entry in question is qid >> = LRU_ENTRY_CLEANUP, so I guess the cache entry is in the cleanup queue, >> whereas this thread in question is trying to access the entry to fill up >> some post-op attributes in the NFS reply. >> >> The workload is "rm -rf" of 1000K files that runs for couple of days, >> with other IO in parallel, it does not crash always and it is hard to >> re-create. My guess of what is happening here is - a file is getting >> removed >> >> Also, we cannot pick up Ganesha 2.2 because of our release cycles, it is >> in the plan, it came out only recently. >> Having said that, the crash you see is with Ganesha 2.1.0 + refcount and >> other patches from 2.2.0. As you said, I also suspect 2.2 may not fix this >> issue, >> >> I just need help in debugging, my question is - when a directory entry is >> getting removed, at the time of filling up the postOp attributes from the >> parent Directory cache entry, what lock is supposed to be held on the >> parent entry? Also, the remove operation has returned with error from the >> VFS FSAL. >> >> valgrind does not show any use-after-free errors or any other significant >> errors, only a bunch of allocated-but-not-freed memory at the end on normal >> exit, and that is usual for Ganehsa I guess? >> >> Regards. >> Krishna Harathi >> >> On Wed, Jun 17, 2015 at 6:36 AM, Malahal Naineni <mala...@us.ibm.com> >> wrote: >> >>> Hi Krishna, The code doesn't seem to match exactly with V2.1.0 but it >>> does look like nfs3_remove() entered label "out_fail". Wondering what >>> the cache_status was at the time of the crash. >>> >>> There were some fixes in V2.2-stable related refcounting, but I am not >>> sure if V2.2-stable fixes your issues. >>> >>> What FSAL are you using? Also, if you can reproduce this under valgrind, >>> that should give us more information to see if we are using the freed >>> entry itself here. >>> >>> As I said, I don't see any commit in particular that fixes this issue but >>> V2.2-stable is the current release (and it is our long term release!) >>> >>> Regards, Malahal. >>> >>> Krishna Harathi [khara...@exablox.com] wrote: >>> > Using Ganesha version 2.1.0, NFSv3 exports and clients. >>> > We are seeing the following crash where Ganesha is trying to access >>> parent >>> > inode to SetPostOpAttr() and ion the crash, we see that the parent >>> > obj_handle is NULL. >>> > Is this a known issue, and are there any recent fices in this area? >>> Any >>> > help is >>> > appreciated. >>> > >>> > Thread 1 (LWP 6688): >>> > #0 0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939 >>> > #1 0x0050e5d8 in cache_inode_lock_trust_attrs (entry=0x6b424500, >>> need_wr_lock=false) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/cache_inode/cache_inode_misc.c:887 >>> > #2 0x004a1e04 in cache_entry_to_nfs3_Fattr (entry=0x6b424500, >>> Fattr=0x698092f0) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:3567 >>> > #3 0x0049a940 in nfs_SetPostOpAttr (entry=0x6b424500, >>> attr=0x698092e8) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:79 >>> > #4 0x0049abc8 in nfs_SetWccData (before_attr=0x70ffdc00, >>> entry=0x6b424500, wcc_data=0x698092c8) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs_proto_tools.c:132 >>> > #5 0x00466bbc in nfs3_remove (arg=0x5fc90358, worker=0x6f008140, >>> req=0x5fc902e8, res=0x698092c0) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/Protocols/NFS/nfs3_remove.c:161 >>> > #6 0x0045b340 in nfs_rpc_execute (req=0x5fc72d30, >>> worker_data=0x6f008140) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1257 >>> > #7 0x0045bfa8 in worker_run (ctx=0x76562f00) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1506 >>> > #8 0x00542684 in fridgethr_start_routine (arg=0x76562f00) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/support/fridgethr.c:562 >>> > #9 0x76f47368 in start_thread () from >>> /lib/mips-linux-gnu/libpthread.so.0 >>> > #10 0x76e9af18 in fcvt_r () from /lib/mips-linux-gnu/libc.so.6 >>> > #11 0x00000000 in ?? () >>> > (gdb) f 0 >>> > #0 0x0050ad94 in cache_inode_is_attrs_valid (entry=0x6b424500) >>> > at >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h:939 >>> > 939 in >>> /git/packaging/nfs-ganesha/nfs-ganesha/src/include/cache_inode.h >>> > >>> > (gdb) p entry->obj_handle >>> > $1 = (struct fsal_obj_handle *) 0x0 >>> > >>> > Regards. >>> > Krishna Harathi >>> >>> > >>> ------------------------------------------------------------------------------ >>> >>> > _______________________________________________ >>> > Nfs-ganesha-devel mailing list >>> > Nfs-ganesha-devel@lists.sourceforge.net >>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel >>> >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Nfs-ganesha-devel mailing list >> Nfs-ganesha-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel >> >> > > > -- > Thanks, > Omkar >
------------------------------------------------------------------------------
_______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel