I re-ran the same test for 48 hours using the NFS 4.0 mount option, to the Ganesha NFS 2.4.1 server, with the client NFS fstab entry:
ede-c2-gw01:/var/top /C2-NFS4 nfs4 rw,hard,noauto,vers=4.0 0 0 and I have not seen any assert or segfaults, so there something going on when using vers=4.2 that is not seen with vers=4.0. When using vers=4.2, I normally see more then 20 asserts or segfault per 24 hours when running my test case. I am going to re-run my tests using vers=4.1 Eric On Wed, Nov 2, 2016 at 12:20 PM, Frank Filz <ffilz...@mindspring.com> wrote: > I'm playing with running Ganesha under valgrind and helgrind to see if > anything drops out from those. > > Unfortunately helgrind seems to show up a lot of data races that either have > no functional impact (stat collection that doesn't use atomic ops), a ton in > the ntirpc code, and it also seems to misunderstand some atomic ops (I HAVE > seen it complain before when something is accessed using atomic ops, but > sometimes while holding a lock, and sometimes not, it decides the fact that > there were unlocked accesses causes a race even though the atomic op should > guarantee). > > Frank > >> -----Original Message----- >> From: Malahal Naineni [mailto:mala...@gmail.com] >> Sent: Tuesday, October 25, 2016 11:22 PM >> To: Eric Eastman <eric.east...@keepertech.com> >> Cc: nfs-ganesha-devel@lists.sourceforge.net >> Subject: Re: [Nfs-ganesha-devel] assert in dec_state_owner_ref() with >> V2.4.0.3 >> >> Please post if you have an easy reproducer. We will try to recreate and > root >> cause it. >> >> On Wed, Oct 26, 2016 at 6:15 AM, Eric Eastman >> <eric.east...@keepertech.com> wrote: >> > A little more info on this issue. I did a 24 hour run of my test >> > using the POSIX FSAL with an ext4 file system as the backstore, and >> > saw 9 asserts during this test run, all caused by the variable >> > "refcount" ending up at -1. The errors seem to be occurring while >> > running "rm -rf" on a directory with 1000 sub-directories, with each >> > having 11 files in it. >> > >> > This looks to me like a race condition and I am having issues finding >> > the root cause reading through the source code. There are notes from >> > commit e7307c5, dated Jan 5 2016, on "Resolve race between >> > get_state_owner and dec_state_owner_ref differently" so this looks >> > like an area that there has been issues before. >> > >> > If anyone has an idea on what the root problem is or where to look, >> > please let me know, as we cannot use Ganesha NFS if it is going to >> > assert during production. >> > >> > Thanks, >> > Eric >> > >> > On Thu, Oct 20, 2016 at 1:22 AM, Eric Eastman >> > <eric.east...@keepertech.com> wrote: >> >> While testing Ganesha NFS V2.4.0.3 using the CEPH FSAL to a ceph file >> >> system, I am seeing the ganesha.nfsd process die due to an assert >> >> call multiple times per hour. I have also seen it die at the same >> >> place in the code using the VFS FSAL with a ext4 file system, but it >> >> dies much less often. >> >> >> >> It is dying at line 917 in src/SAL/state_misc.c, which is called by >> >> src/SAL/state_misc.c at line 1010. The assert call is in >> >> dec_state_owner_ref() at the line: >> >> >> >> assert(refcount > 0); >> >> >> >> Looking at the core files and adding in some debugging code confirms >> >> that refcount is -1 when the assert call is made. >> >> >> >> It looks like the owner count is trying to go to -1 in >> >> uncache_nfs4_owner(), but as it occurs only on occasions, I think it >> >> is a race condition. >> >> >> >> Info on the build: >> >> >> >> Host OS is Ubuntu 14.04 with a 4.8.2 x86_64 kernel on a 8 processor >> >> system >> >> >> >> Cmake command: >> >> # cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper - >> DALLOCATOR=jemalloc >> >> -DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src >> >> >> >> # ganesha.nfsd -v >> >> ganesha.nfsd compiled on Oct 17 2016 at 16:50:18 Release = V2.4.0.3 >> >> Release comment = GANESHA file server is 64 bits compliant and >> >> supports NFS v3,4.0,4.1 (pNFS) and 9P Git HEAD = >> >> 0f55a9a97a4bf232fb0e42542e4ca7491fbf84ce >> >> Git Describe = V2.4.0.3-0-g0f55a9a >> >> >> >> # ceph -v >> >> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) >> >> >> >> # cat ganesha.conf >> >> LOG { >> >> components { >> >> ALL = INFO; >> >> } >> >> } >> >> >> >> EXPORT_DEFAULTS { >> >> SecType = none, sys; >> >> Protocols = 3, 4; >> >> Transports = TCP; >> >> } >> >> >> >> # define CephFS export >> >> EXPORT { >> >> Export_ID = 42; >> >> Path = /top; >> >> Pseudo = /top; >> >> Access_Type = RW; >> >> Squash = No_Root_Squash; >> >> FSAL { >> >> Name = CEPH; >> >> } >> >> } >> >> >> >> The VFS export for the ext4 tests was: >> >> >> >> # define CephFS export >> >> EXPORT { >> >> Export_ID = 43; >> >> Path = /var/top; >> >> Pseudo = /var/top; >> >> Access_Type = RW; >> >> Squash = No_Root_Squash; >> >> FSAL { >> >> Name = VFS; >> >> } >> >> } >> >> >> >> The test was 2 Ubuntu 14.04 NFS clients each having 6 processes, >> >> writing 11,000 256k files in separate directory trees with 11 files >> >> per lowest level node. On each Ubuntu client, 3 processes wrote to a >> >> NFS 3 mount and 3 wrote to a NFS 4 mount. The files are then read and >> >> verified, deleted, and the test restarts. >> >> >> >> Regards, >> >> Eric >> > >> > ---------------------------------------------------------------------- >> > -------- The Command Line: Reinvented for Modern Developers Did the >> > resurgence of CLI tooling catch you by surprise? >> > Reconnect with the command line and become more productive. >> > Learn the new .NET and ASP.NET CLI. Get your free copy! >> > http://sdm.link/telerik >> > _______________________________________________ >> > Nfs-ganesha-devel mailing list >> > Nfs-ganesha-devel@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel >> >> > ---------------------------------------------------------------------------- > -- >> The Command Line: Reinvented for Modern Developers Did the resurgence >> of CLI tooling catch you by surprise? >> Reconnect with the command line and become more productive. >> Learn the new .NET and ASP.NET CLI. Get your free copy! >> http://sdm.link/telerik >> _______________________________________________ >> Nfs-ganesha-devel mailing list >> Nfs-ganesha-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel