I do not know if the two problems are related.  In a 24 hour test run,
using upstart to restart Ganesha NFS when it dies, I saw 17 assert
failures and 4 segfault failures.

Time wise, they did not happen at the same time, with minutes to over
an hour in some cases between a segfault crash and an assert crash.  I
will look through the segfault core files to see if any of the threads
are in dec_state_owner_ref().

Thanks,
Eric

On Wed, Nov 2, 2016 at 6:40 AM, Daniel Gryniewicz <d...@redhat.com> wrote:
> These to be a use-after-free on an owner (refcount bug, likely?).
>
> Daniel
>
> On 11/02/2016 01:47 AM, Eric Eastman wrote:
>> While testing V2.4.1 to check if I was still seeing the
>> "assert(refcount > 0)" failure I reported on V2.4.0.3, which is still
>> occurring with 2.4.1, I am also seeing cases of segfault errors.
>> Three of the segfaults I have seen in my testing of V2.4.1 had
>> identical backtraces, and the forth was at a different point in the
>> code.  From the command "dmesg -T | grep segfault"
>>
>> [Tue Nov  1 06:33:28 2016] ganesha.nfsd[33814]: segfault at 0 ip
>> 00000000004cdcad sp 00007fae1df92160 error 6 in
>> ganesha.nfsd[400000+1bb000]
>> [Tue Nov  1 07:02:17 2016] ganesha.nfsd[37754]: segfault at 0 ip
>> 00000000004cdcad sp 00007f3b9c366160 error 6 in
>> ganesha.nfsd[400000+1bb000]
>> [Tue Nov  1 11:17:07 2016] ganesha.nfsd[39697]: segfault at a0 ip
>> 00007f4b0ed4f414 sp 00007f4ab11422b0 error 4 in
>> libpthread-2.19.so[7f4b0ed45000+19000]
>> [Tue Nov  1 14:32:18 2016] ganesha.nfsd[41935]: segfault at 0 ip
>> 00000000004cdcad sp 00007f968ad86160 error 6 in
>> ganesha.nfsd[400000+1bb000]
>>
>> As I was debugging the assert issue, I had the system configured to
>> capture core dumps.  All three of the "error 6" backtraces looked the
>> same:
>>
>> ...
>> Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `/opt/keeper/bin/ganesha.nfsd -F -L
>> /var/log/nfs-ganesha.log'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x00000000004cdcad in __glist_add (left=0x0, right=0x7fadcc1ca3a8,
>> elt=0x7fadc94e0f50) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/include/gsh_list.h:78
>> 78 left->next = elt;
>>
>> (gdb) bt
>> #0  0x00000000004cdcad in __glist_add (left=0x0, right=0x7fadcc1ca3a8,
>> elt=0x7fadc94e0f50) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/include/gsh_list.h:78
>> #1  0x00000000004cdce9 in glist_add_tail (head=0x7fadcc1ca3a8,
>> elt=0x7fadc94e0f50) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/include/gsh_list.h:86
>> #2  0x00000000004cef3b in state_add_impl (obj=0x7fadb2a41638,
>> state_type=STATE_TYPE_SHARE, state_data=0x7fae1df923d0,
>> owner_input=0x7fadcc1ca200, state=0x7fae1df92da0,
>> refer=0x7fae1df924f0) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/SAL/nfs4_state.c:213
>> #3  0x000000000047333c in open4_ex (arg=0x7fadace1a8a8,
>> data=0x7fae1df92e80, res_OPEN4=0x7fadce951948,
>> clientid=0x7fadc8819600, owner=0x7fadcc1ca200,
>> file_state=0x7fae1df92da0, new_state=0x7fae1df92d68) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1535
>> #4  0x000000000047432e in nfs4_op_open (op=0x7fadace1a8a0,
>> data=0x7fae1df92e80, resp=0x7fadce951940) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_op_open.c:1844
>> #5  0x000000000045e90d in nfs4_Compound (arg=0x7fadac9d24e8,
>> req=0x7fadac9d2328, res=0x7fadc9466ec0) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:734
>> #6  0x000000000044b9a8 in nfs_rpc_execute (reqdata=0x7fadac9d2300) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1281
>> #7  0x000000000044c317 in worker_run (ctx=0x7fae53fd8680) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548
>> #8  0x000000000050b288 in fridgethr_start_routine (arg=0x7fae53fd8680)
>> at 
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/support/fridgethr.c:550
>> #9  0x00007fae59b65182 in start_thread (arg=0x7fae1df94700) at
>> pthread_create.c:312
>> #10 0x00007fae5943b47d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> The "error 4" bt:
>> ...
>> [New LWP 39788]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `/opt/keeper/bin/ganesha.nfsd -F -L
>> /var/log/nfs-ganesha.log'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
>> 66 ../nptl/pthread_mutex_lock.c: No such file or directory.
>>
>> (gdb) bt
>> #0  __GI___pthread_mutex_lock (mutex=0x0) at ../nptl/pthread_mutex_lock.c:66
>> #1  0x00000000004d7114 in nfs4_Check_Stateid (stateid=0x7f4a779e50a8,
>> fsal_obj=0x7f4a6c303038, state=0x7f4ab11424f0, data=0x7f4ab1142e80,
>> flags=63, owner_seqid=0, check_seqid=false, tag=0x569d7b "WRITE") at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/SAL/nfs4_state_id.c:1076
>> #2  0x00000000004852b8 in nfs4_write (op=0x7f4a779e50a0,
>> data=0x7f4ab1142e80, resp=0x7f4a75e14dc0, io=FSAL_IO_WRITE, info=0x0)
>> at 
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_op_write.c:213
>> #3  0x0000000000485d1b in nfs4_op_write (op=0x7f4a779e50a0,
>> data=0x7f4ab1142e80, resp=0x7f4a75e14dc0) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_op_write.c:482
>> #4  0x000000000045e90d in nfs4_Compound (arg=0x7f4a779e4268,
>> req=0x7f4a779e40a8, res=0x7f4a7d4c5c40) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/Protocols/NFS/nfs4_Compound.c:734
>> #5  0x000000000044b9a8 in nfs_rpc_execute (reqdata=0x7f4a779e4080) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1281
>> #6  0x000000000044c317 in worker_run (ctx=0x7f4b093e9100) at
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/MainNFSD/nfs_worker_thread.c:1548
>> #7  0x000000000050b288 in fridgethr_start_routine (arg=0x7f4b093e9100)
>> at 
>> /home/keeper/work/ganesha/2.4.1-debug/ganeshabuilder/nfs-ganesha/src/support/fridgethr.c:550
>> #8  0x00007f4b0ed4d182 in start_thread (arg=0x7f4ab1144700) at
>> pthread_create.c:312
>> #9  0x00007f4b0e62347d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> I have the cores, so if you need a backtrace of all threads or other
>> info, just let me know.
>>
>> The test I was running was the same test as given in the email with
>> the subject: "assert in dec_state_owner_ref() with V2.4.0.3" which has
>> multiple worker processes creating 1000 uniquely named directories and
>> in each creates 11 256K files.  It then does a "rm -rf" on the
>> directory tree and starts over.
>>
>> Information on the test system and versions:
>>
>> Ganesha version:
>> # ganesha.nfsd -v
>> nfs-ganesha compiled on Oct 30 2016 at 22:46:56
>> Release = V2.4.1
>> Release comment = GANESHA file server is 64 bits compliant and
>> supports NFS v3,4.0,4.1 (pNFS) and 9P
>> Git HEAD = a146801b8e29580697391fb7f165ae9ead023894
>> Git Describe = V2.4.1-0-ga146801
>>
>> OS is Ubuntu 14.04 with a 4.8.4 x86_64 kernel on a 8 processor system.
>> Backstore file systems is ext4.
>>
>> The client is mounting the NFS server with the fstab line:
>>    ede-c2-gw01:/var/top /C2-NFS4 nfs4 rw,hard,noauto,vers=4.2  0 0
>>
>> # cat ganesha.conf
>> LOG {
>>     components {
>>        ALL = INFO;
>>     }
>> }
>> EXPORT_DEFAULTS {
>>    SecType = none, sys;
>>    Protocols = 3, 4;
>>    Transports = TCP;
>> }
>> EXPORT {
>>     Export_ID = 43;
>>     Path = /var/top;
>>     Pseudo = /var/top;
>>     Access_Type = RW;
>>     Squash = No_Root_Squash;
>>     FSAL {
>>         Name = VFS;
>>     }
>> }
>>
>> Cmake command:
>> # cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper -DALLOCATOR=jemalloc
>> -DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src
>>
>> Thanks,
>> Eric
>>
>> ------------------------------------------------------------------------------
>> Developer Access Program for Intel Xeon Phi Processors
>> Access to Intel Xeon Phi processor-based developer platforms.
>> With one year of Intel Parallel Studio XE.
>> Training and support from Colfax.
>> Order your platform today. http://sdm.link/xeonphi
>> _______________________________________________
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>
>
> ------------------------------------------------------------------------------
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to