This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.
Hi All

Thanks for the suggestions, I'll have a go and report back. Sharing some
more info that may or may not be relevant:

The clients I'm accessing Ganesha with were previously accessing a Kernel
NFS server running on CentOS. I was experiencing frequent issues where
lockd on the server would go into uninterruptible sleep and I needed to
stop the nfslock service, clear out /var/lib/nfs/statd dirs and start
nfslock again to get things working.

Now that these clients are accessing the nfs-ganesha server, I'm seeing
similar behaviour, the clients were showing "lockd: server *ipaddr *not
responding" and I had to restart nfs-ganesha to resolve. I don't know if
these crashes are related to that in any way?

The other thing to note is some of the exports are on a cephfs mount but
I'm using the VFS FSAL, not the CEPH FSAL.

When it is working, performance seems good, and the crashes don't appear to
happen during periods of high I/O.

Thanks,



On Mon, Oct 1, 2018 at 4:30 PM Malahal Naineni <mala...@gmail.com> wrote:

> This list has been deprecated. Please subscribe to the new devel list at
> lists.nfs-ganesha.org.
> David, another option is to test with Ganesha2.7 as you are able to
> recreate easily with V2.6.3.
>
> On Mon, Oct 1, 2018 at 7:49 PM Daniel Gryniewicz <d...@redhat.com> wrote:
>
>> This list has been deprecated. Please subscribe to the new devel list at
>> lists.nfs-ganesha.org.
>>
>> I'm not seeing any easy way that cmpf could be corrupted.  The structure
>> before it is fairly complex, with it's last element being an integer, so
>> it's unlikely that something wrote off the end of that.  That leaves a
>> random memory corruption, which is almost impossible to detect.
>>
>> David, can you rebuild your Ganesha?  If so, can you build with the
>> Address Sanitizer on?  To do this, install libasan on your distro, and
>> then pass -DSANITIZE_ADDRESS=ON to cmake.  With ASAN enabled, you may
>> get a crash at the time of corruption, rather than at some future point.
>>
>> Daniel
>>
>> On 10/01/2018 09:20 AM, Malahal Naineni wrote:
>> > This list has been deprecated. Please subscribe to the new devel list
>> at lists.nfs-ganesha.org.
>> >
>> >
>> >
>> > Looking at the code head->cmpf should be "clnt_req_xid_cmpf" function
>> > address. Your gdb didn't show that, but I don't know how that could
>> > happen with the V2.6.3 code though. @Dan, any insights for this issue?
>> >
>> > On Mon, Oct 1, 2018 at 2:22 PM David C <dcsysengin...@gmail.com
>> > <mailto:dcsysengin...@gmail.com>> wrote:
>> >
>> >     Hi Malahal
>> >
>> >     Result of that command:
>> >
>> >     (gdb) p head->cmpf
>> >     $1 = (opr_rbtree_cmpf_t) 0x31fb0b405ba000b7
>> >
>> >     Thanks,
>> >
>> >     On Mon, Oct 1, 2018 at 5:55 AM Malahal Naineni <mala...@gmail.com
>> >     <mailto:mala...@gmail.com>> wrote:
>> >
>> >         Looks like the head is messed up. Run these in gdb and let us
>> >         know the second commands output. 1. "frame 0"   2.
>> >         "p head->cmpf".  I believe, head->cmpf function is NULL or bad
>> >         leading to this segfault. I haven't seen this crash before and
>> >         never used Ganesha 2.6 version.
>> >
>> >         Regards, Malahal.
>> >
>> >         On Mon, Oct 1, 2018 at 1:25 AM David C <dcsysengin...@gmail.com
>> >         <mailto:dcsysengin...@gmail.com>> wrote:
>> >
>> >             Hi Malahal
>> >
>> >             I've set up ABRT so I'm now getting coredumps for the
>> >             crashes. I've installed debuginfo package for nfs-ganesha
>> >             and libntirpc.
>> >
>> >             I'd be really grateful if you could give me some guidance on
>> >             debugging this.
>> >
>> >             Some info on the latest crash:
>> >
>> >             The following was echoed to the kernel log:
>> >
>> >                 traps: ganesha.nfsd[28589] general protection
>> >                 ip:7fcf2421dded sp:7fcd9d4d03a0 error:0 in
>> >                 libntirpc.so.1.6.3[7fcf2420d000+3d000]
>> >
>> >
>> >             Last lines of output from # gdb /usr/bin/ganesha.nfsd
>> coredump:
>> >
>> >             [Thread debugging using libthread_db enabled]
>> >             Using host libthread_db library "/lib64/libthread_db.so.1".
>> >             Core was generated by `/usr/bin/ganesha.nfsd -L
>> >             /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'.
>> >             Program terminated with signal 11, Segmentation fault.
>> >             #0  0x00007fcf2421dded in opr_rbtree_insert
>> >             (head=head@entry=0x7fcef800c528,
>> >             node=node@entry=0x7fce68004750) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
>> >             271                     switch (head->cmpf(node, parent)) {
>> >             Missing separate debuginfos, use: debuginfo-install
>> >             bzip2-libs-1.0.6-13.el7.x86_64
>> >             dbus-libs-1.10.24-7.el7.x86_64
>> >             elfutils-libelf-0.170-4.el7.x86_64
>> >             elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
>> >             gssproxy-0.7.0-17.el7.x86_64
>> >             keyutils-libs-1.5.8-3.el7.x86_64
>> >             krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64
>> >             libblkid-2.23.2-52.el7.x86_64 libcap-2.22-9.el7.x86_64
>> >             libcom_err-1.42.9-12.el7_5.x86_64
>> >             libgcc-4.8.5-28.el7_5.1.x86_64 libgcrypt-1.5.3-14.el7.x86_64
>> >             libgpg-error-1.12-3.el7.x86_64
>> >             libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-12.el7.x86_64
>> >             libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
>> >             pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7.x86_64
>> >             xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
>> >
>> >             Output from bt:
>> >
>> >             (gdb) bt
>> >             #0  0x00007fcf2421dded in opr_rbtree_insert
>> >             (head=head@entry=0x7fcef800c528,
>> >             node=node@entry=0x7fce68004750) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
>> >             #1  0x00007fcf24218eac in clnt_req_setup
>> >             (cc=cc@entry=0x7fce68004720, timeout=...) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/clnt_generic.c:515
>> >             #2  0x000055d62490347f in nsm_unmonitor
>> >             (host=host@entry=0x7fce00018ea0) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/nsm.c:219
>> >             #3  0x000055d6249425cf in dec_nsm_client_ref
>> >             (client=0x7fce00018ea0) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:857
>> >             #4  0x000055d624942f61 in free_nlm_client
>> >             (client=0x7fce00017500) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1039
>> >             #5  0x000055d6249431d3 in dec_nlm_client_ref
>> >             (client=0x7fce00017500) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1130
>> >             #6  0x000055d6249439ae in free_nlm_owner
>> >             (owner=owner@entry=0x7fce00024bc0) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1314
>> >             #7  0x000055d624929a48 in free_state_owner
>> >             (owner=0x7fce00024bc0) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:818
>> >             #8  0x000055d624929dc0 in dec_state_owner_ref
>> >             (owner=0x7fce00024bc0) at
>> >             /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:968
>> >             #9  0x000055d6248ff173 in nlm4_Unlock (args=0x7fce68003b98,
>> >             req=0x7fce68003490, res=0x7fce68000d70) at
>> >
>>  /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/nlm_Unlock.c:127
>> >             #10 0x000055d6248c0f0f in nfs_rpc_process_request
>> >             (reqdata=0x7fce68003490) at
>> >
>>  /usr/src/debug/nfs-ganesha-2.6.3/src/MainNFSD/nfs_worker_thread.c:1329
>> >             #11 0x000055d6248c02ba in nfs_rpc_decode_request
>> >             (xprt=0x7fcef011b600, xdrs=0x7fce68001480)
>> >                  at
>> >
>>  
>> /usr/src/debug/nfs-ganesha-2.6.3/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
>> >             #12 0x00007fcf2422dbcd in svc_rqst_xprt_task
>> >             (wpe=0x7fcef011b818) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:751
>> >             #13 0x00007fcf2422df2a in svc_rqst_epoll_events
>> >             (n_events=<optimized out>, sr_rec=0x55d6253b3fd0) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:923
>> >             #14 svc_rqst_epoll_loop (sr_rec=<optimized out>) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:996
>> >             #15 svc_rqst_run_task (wpe=0x55d6253b3fd0) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:1032
>> >             #16 0x00007fcf2423671a in work_pool_thread
>> >             (arg=0x55d6282753f0) at
>> >             /usr/src/debug/ntirpc-1.6.3/src/work_pool.c:176
>> >             #17 0x00007fcf2465ce25 in start_thread () from
>> >             /lib64/libpthread.so.0
>> >             #18 0x00007fcf23d28bad in clone () from /lib64/libc.so.6
>> >
>> >             Thanks for your assistance so far on this
>> >             David
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >             On Fri, Sep 28, 2018 at 8:06 PM David C
>> >             <dcsysengin...@gmail.com <mailto:dcsysengin...@gmail.com>>
>> >             wrote:
>> >
>> >                 Thanks, Malahal. I'll get the coredumps enabled. I've
>> >                 had a few more crashes today, hopefully they'll shed
>> >                 some light on the issue.
>> >
>> >                 On Fri, Sep 28, 2018 at 1:20 PM Malahal Naineni
>> >                 <mala...@gmail.com <mailto:mala...@gmail.com>> wrote:
>> >
>> >                     You need to enable coredumps for ganesha. Here are
>> >                     some instructions! Step2 is NOT needed as your
>> >                     packages are signed:
>> >
>> >
>> https://ganltc.github.io/setup-to-take-ganesha-coredumps.html
>> >
>> >                     On Fri, Sep 28, 2018 at 4:38 PM David C
>> >                     <dcsysengin...@gmail.com
>> >                     <mailto:dcsysengin...@gmail.com>> wrote:
>> >
>> >                         This list has been deprecated. Please subscribe
>> >                         to the new devel list at lists.nfs-ganesha.org
>> >                         <http://lists.nfs-ganesha.org>.
>> >                         Hi All
>> >
>> >                         CentOS 7.5
>> >                         nfs-ganesha-2.6.3-1.el7.x86_64
>> >                         nfs-ganesha-vfs-2.6.3-1.el7.x86_64
>> >                         libntirpc-1.6.3-1.el7.x86_64
>> >
>> >                         My Ganesha service crashed and the following was
>> >                         echoed to my kernel log:
>> >
>> >                             ganesha.nfsd[28752]: segfault at 0
>> >                             ip           (null) sp 00007ff9a2af8458
>> >                             error 14 in
>> ganesha.nfsd[559170ef3000+1a4000]
>> >
>> >
>> >                         Nothing in my ganesha.log
>> >
>> >                         These are the log settings from my ganesha.conf:
>> >
>> >                             LOG {
>> >                                      ## Default log level for all
>> components
>> >                                      Default_Log_Level = DEBUG;
>> >
>> >                                      ## Configure per-component log
>> levels.
>> >                                      #Components {
>> >                                              #FSAL = INFO;
>> >                                              #NFS4 = EVENT;
>> >                                      #}
>> >
>> >                                      ## Where to log
>> >                                      Facility {
>> >                                              name = FILE;
>> >                                              destination =
>> >                             "/var/log/ganesha.log";
>> >                                              enable = active;
>> >                                      }
>> >                             }
>> >
>> >
>> >                         This is an example of one of my exports (they're
>> >                         all Nfsv3 with VFS FSAL):
>> >
>> >                             EXPORT
>> >                             {
>> >                                      Export_Id = 80;
>> >                                      Path = /mnt/dir;
>> >                                      Pseudo = /mnt/dir;
>> >                                      Access_Type = RW;
>> >                                      Protocols = 3;
>> >                                      Transports = TCP;
>> >                                      Squash = no_root_squash;
>> >                                      Disable_ACL=False;
>> >                                      Filesystem_Id = 101.1;
>> >                                      CLIENT {
>> >                                         Clients = *;
>> >                                         Squash = None;
>> >                                         Access_Type = RW;
>> >                                      }
>> >                                      FSAL {
>> >                                            Name = VFS;
>> >                                       }
>> >                             }
>> >
>> >
>> >                         The exports are mounted on CentOS 7.4 clients
>> >                         with autofs-5.0.7 and
>> >                         nfs-utils-1.3.0-0.48.el7_4.x86_64
>> >
>> >                         This crashed occurred approx 2 hours after I
>> >                         increased the number of clients accessing the
>> >                         server by approx five clients, don't know if
>> >                         that's related
>> >
>> >                         Could someone help me troubleshoot this please?
>> >
>> >                         Many thanks
>> >                         David
>> >
>> >
>> >
>> >
>> >
>> >                         _______________________________________________
>> >                         Nfs-ganesha-devel mailing list
>> >                         Nfs-ganesha-devel@lists.sourceforge.net
>> >                         <mailto:Nfs-ganesha-devel@lists.sourceforge.net
>> >
>> >
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>> >
>>
>>
>>
>> _______________________________________________
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to