This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.

I'm not seeing any easy way that cmpf could be corrupted. The structure before it is fairly complex, with it's last element being an integer, so it's unlikely that something wrote off the end of that. That leaves a random memory corruption, which is almost impossible to detect.

David, can you rebuild your Ganesha? If so, can you build with the Address Sanitizer on? To do this, install libasan on your distro, and then pass -DSANITIZE_ADDRESS=ON to cmake. With ASAN enabled, you may get a crash at the time of corruption, rather than at some future point.

Daniel

On 10/01/2018 09:20 AM, Malahal Naineni wrote:
This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.



Looking at the code head->cmpf should be "clnt_req_xid_cmpf" function address. Your gdb didn't show that, but I don't know how that could happen with the V2.6.3 code though. @Dan, any insights for this issue?

On Mon, Oct 1, 2018 at 2:22 PM David C <dcsysengin...@gmail.com <mailto:dcsysengin...@gmail.com>> wrote:

    Hi Malahal

    Result of that command:

    (gdb) p head->cmpf
    $1 = (opr_rbtree_cmpf_t) 0x31fb0b405ba000b7

    Thanks,

    On Mon, Oct 1, 2018 at 5:55 AM Malahal Naineni <mala...@gmail.com
    <mailto:mala...@gmail.com>> wrote:

        Looks like the head is messed up. Run these in gdb and let us
        know the second commands output. 1. "frame 0"   2.
        "p head->cmpf".  I believe, head->cmpf function is NULL or bad
        leading to this segfault. I haven't seen this crash before and
        never used Ganesha 2.6 version.

        Regards, Malahal.

        On Mon, Oct 1, 2018 at 1:25 AM David C <dcsysengin...@gmail.com
        <mailto:dcsysengin...@gmail.com>> wrote:

            Hi Malahal

            I've set up ABRT so I'm now getting coredumps for the
            crashes. I've installed debuginfo package for nfs-ganesha
            and libntirpc.

            I'd be really grateful if you could give me some guidance on
            debugging this.

            Some info on the latest crash:

            The following was echoed to the kernel log:

                traps: ganesha.nfsd[28589] general protection
                ip:7fcf2421dded sp:7fcd9d4d03a0 error:0 in
                libntirpc.so.1.6.3[7fcf2420d000+3d000]


            Last lines of output from # gdb /usr/bin/ganesha.nfsd coredump:

            [Thread debugging using libthread_db enabled]
            Using host libthread_db library "/lib64/libthread_db.so.1".
            Core was generated by `/usr/bin/ganesha.nfsd -L
            /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.c'.
            Program terminated with signal 11, Segmentation fault.
            #0  0x00007fcf2421dded in opr_rbtree_insert
            (head=head@entry=0x7fcef800c528,
            node=node@entry=0x7fce68004750) at
            /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
            271                     switch (head->cmpf(node, parent)) {
            Missing separate debuginfos, use: debuginfo-install
            bzip2-libs-1.0.6-13.el7.x86_64
            dbus-libs-1.10.24-7.el7.x86_64
            elfutils-libelf-0.170-4.el7.x86_64
            elfutils-libs-0.170-4.el7.x86_64 glibc-2.17-222.el7.x86_64
            gssproxy-0.7.0-17.el7.x86_64
            keyutils-libs-1.5.8-3.el7.x86_64
            krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64
            libblkid-2.23.2-52.el7.x86_64 libcap-2.22-9.el7.x86_64
            libcom_err-1.42.9-12.el7_5.x86_64
            libgcc-4.8.5-28.el7_5.1.x86_64 libgcrypt-1.5.3-14.el7.x86_64
            libgpg-error-1.12-3.el7.x86_64
            libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-12.el7.x86_64
            libuuid-2.23.2-52.el7.x86_64 lz4-1.7.5-2.el7.x86_64
            pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7.x86_64
            xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64

            Output from bt:

            (gdb) bt
            #0  0x00007fcf2421dded in opr_rbtree_insert
            (head=head@entry=0x7fcef800c528,
            node=node@entry=0x7fce68004750) at
            /usr/src/debug/ntirpc-1.6.3/src/rbtree.c:271
            #1  0x00007fcf24218eac in clnt_req_setup
            (cc=cc@entry=0x7fce68004720, timeout=...) at
            /usr/src/debug/ntirpc-1.6.3/src/clnt_generic.c:515
            #2  0x000055d62490347f in nsm_unmonitor
            (host=host@entry=0x7fce00018ea0) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/nsm.c:219
            #3  0x000055d6249425cf in dec_nsm_client_ref
            (client=0x7fce00018ea0) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:857
            #4  0x000055d624942f61 in free_nlm_client
            (client=0x7fce00017500) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1039
            #5  0x000055d6249431d3 in dec_nlm_client_ref
            (client=0x7fce00017500) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1130
            #6  0x000055d6249439ae in free_nlm_owner
            (owner=owner@entry=0x7fce00024bc0) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/nlm_owner.c:1314
            #7  0x000055d624929a48 in free_state_owner
            (owner=0x7fce00024bc0) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:818
            #8  0x000055d624929dc0 in dec_state_owner_ref
            (owner=0x7fce00024bc0) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/SAL/state_misc.c:968
            #9  0x000055d6248ff173 in nlm4_Unlock (args=0x7fce68003b98,
            req=0x7fce68003490, res=0x7fce68000d70) at
            /usr/src/debug/nfs-ganesha-2.6.3/src/Protocols/NLM/nlm_Unlock.c:127
            #10 0x000055d6248c0f0f in nfs_rpc_process_request
            (reqdata=0x7fce68003490) at
            
/usr/src/debug/nfs-ganesha-2.6.3/src/MainNFSD/nfs_worker_thread.c:1329
            #11 0x000055d6248c02ba in nfs_rpc_decode_request
            (xprt=0x7fcef011b600, xdrs=0x7fce68001480)
                 at
            
/usr/src/debug/nfs-ganesha-2.6.3/src/MainNFSD/nfs_rpc_dispatcher_thread.c:1341
            #12 0x00007fcf2422dbcd in svc_rqst_xprt_task
            (wpe=0x7fcef011b818) at
            /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:751
            #13 0x00007fcf2422df2a in svc_rqst_epoll_events
            (n_events=<optimized out>, sr_rec=0x55d6253b3fd0) at
            /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:923
            #14 svc_rqst_epoll_loop (sr_rec=<optimized out>) at
            /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:996
            #15 svc_rqst_run_task (wpe=0x55d6253b3fd0) at
            /usr/src/debug/ntirpc-1.6.3/src/svc_rqst.c:1032
            #16 0x00007fcf2423671a in work_pool_thread
            (arg=0x55d6282753f0) at
            /usr/src/debug/ntirpc-1.6.3/src/work_pool.c:176
            #17 0x00007fcf2465ce25 in start_thread () from
            /lib64/libpthread.so.0
            #18 0x00007fcf23d28bad in clone () from /lib64/libc.so.6

            Thanks for your assistance so far on this
            David








            On Fri, Sep 28, 2018 at 8:06 PM David C
            <dcsysengin...@gmail.com <mailto:dcsysengin...@gmail.com>>
            wrote:

                Thanks, Malahal. I'll get the coredumps enabled. I've
                had a few more crashes today, hopefully they'll shed
                some light on the issue.

                On Fri, Sep 28, 2018 at 1:20 PM Malahal Naineni
                <mala...@gmail.com <mailto:mala...@gmail.com>> wrote:

                    You need to enable coredumps for ganesha. Here are
                    some instructions! Step2 is NOT needed as your
                    packages are signed:

                    
https://ganltc.github.io/setup-to-take-ganesha-coredumps.html

                    On Fri, Sep 28, 2018 at 4:38 PM David C
                    <dcsysengin...@gmail.com
                    <mailto:dcsysengin...@gmail.com>> wrote:

                        This list has been deprecated. Please subscribe
                        to the new devel list at lists.nfs-ganesha.org
                        <http://lists.nfs-ganesha.org>.
                        Hi All

                        CentOS 7.5
                        nfs-ganesha-2.6.3-1.el7.x86_64
                        nfs-ganesha-vfs-2.6.3-1.el7.x86_64
                        libntirpc-1.6.3-1.el7.x86_64

                        My Ganesha service crashed and the following was
                        echoed to my kernel log:

                            ganesha.nfsd[28752]: segfault at 0
                            ip           (null) sp 00007ff9a2af8458
                            error 14 in ganesha.nfsd[559170ef3000+1a4000]


                        Nothing in my ganesha.log

                        These are the log settings from my ganesha.conf:

                            LOG {
                                     ## Default log level for all components
                                     Default_Log_Level = DEBUG;

                                     ## Configure per-component log levels.
                                     #Components {
                                             #FSAL = INFO;
                                             #NFS4 = EVENT;
                                     #}

                                     ## Where to log
                                     Facility {
                                             name = FILE;
                                             destination =
                            "/var/log/ganesha.log";
                                             enable = active;
                                     }
                            }


                        This is an example of one of my exports (they're
                        all Nfsv3 with VFS FSAL):

                            EXPORT
                            {
                                     Export_Id = 80;
                                     Path = /mnt/dir;
                                     Pseudo = /mnt/dir;
                                     Access_Type = RW;
                                     Protocols = 3;
                                     Transports = TCP;
                                     Squash = no_root_squash;
                                     Disable_ACL=False;
                                     Filesystem_Id = 101.1;
                                     CLIENT {
                                        Clients = *;
                                        Squash = None;
                                        Access_Type = RW;
                                     }
                                     FSAL {
                                           Name = VFS;
                                      }
                            }


                        The exports are mounted on CentOS 7.4 clients
                        with autofs-5.0.7 and
                        nfs-utils-1.3.0-0.48.el7_4.x86_64

                        This crashed occurred approx 2 hours after I
                        increased the number of clients accessing the
                        server by approx five clients, don't know if
                        that's related

                        Could someone help me troubleshoot this please?

                        Many thanks
                        David





                        _______________________________________________
                        Nfs-ganesha-devel mailing list
                        Nfs-ganesha-devel@lists.sourceforge.net
                        <mailto:Nfs-ganesha-devel@lists.sourceforge.net>
                        
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel





_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel




_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to