I'm playing with running Ganesha under valgrind and helgrind to see if anything drops out from those.
Unfortunately helgrind seems to show up a lot of data races that either have no functional impact (stat collection that doesn't use atomic ops), a ton in the ntirpc code, and it also seems to misunderstand some atomic ops (I HAVE seen it complain before when something is accessed using atomic ops, but sometimes while holding a lock, and sometimes not, it decides the fact that there were unlocked accesses causes a race even though the atomic op should guarantee). Frank > -----Original Message----- > From: Malahal Naineni [mailto:mala...@gmail.com] > Sent: Tuesday, October 25, 2016 11:22 PM > To: Eric Eastman <eric.east...@keepertech.com> > Cc: nfs-ganesha-devel@lists.sourceforge.net > Subject: Re: [Nfs-ganesha-devel] assert in dec_state_owner_ref() with > V2.4.0.3 > > Please post if you have an easy reproducer. We will try to recreate and root > cause it. > > On Wed, Oct 26, 2016 at 6:15 AM, Eric Eastman > <eric.east...@keepertech.com> wrote: > > A little more info on this issue. I did a 24 hour run of my test > > using the POSIX FSAL with an ext4 file system as the backstore, and > > saw 9 asserts during this test run, all caused by the variable > > "refcount" ending up at -1. The errors seem to be occurring while > > running "rm -rf" on a directory with 1000 sub-directories, with each > > having 11 files in it. > > > > This looks to me like a race condition and I am having issues finding > > the root cause reading through the source code. There are notes from > > commit e7307c5, dated Jan 5 2016, on "Resolve race between > > get_state_owner and dec_state_owner_ref differently" so this looks > > like an area that there has been issues before. > > > > If anyone has an idea on what the root problem is or where to look, > > please let me know, as we cannot use Ganesha NFS if it is going to > > assert during production. > > > > Thanks, > > Eric > > > > On Thu, Oct 20, 2016 at 1:22 AM, Eric Eastman > > <eric.east...@keepertech.com> wrote: > >> While testing Ganesha NFS V2.4.0.3 using the CEPH FSAL to a ceph file > >> system, I am seeing the ganesha.nfsd process die due to an assert > >> call multiple times per hour. I have also seen it die at the same > >> place in the code using the VFS FSAL with a ext4 file system, but it > >> dies much less often. > >> > >> It is dying at line 917 in src/SAL/state_misc.c, which is called by > >> src/SAL/state_misc.c at line 1010. The assert call is in > >> dec_state_owner_ref() at the line: > >> > >> assert(refcount > 0); > >> > >> Looking at the core files and adding in some debugging code confirms > >> that refcount is -1 when the assert call is made. > >> > >> It looks like the owner count is trying to go to -1 in > >> uncache_nfs4_owner(), but as it occurs only on occasions, I think it > >> is a race condition. > >> > >> Info on the build: > >> > >> Host OS is Ubuntu 14.04 with a 4.8.2 x86_64 kernel on a 8 processor > >> system > >> > >> Cmake command: > >> # cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper - > DALLOCATOR=jemalloc > >> -DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src > >> > >> # ganesha.nfsd -v > >> ganesha.nfsd compiled on Oct 17 2016 at 16:50:18 Release = V2.4.0.3 > >> Release comment = GANESHA file server is 64 bits compliant and > >> supports NFS v3,4.0,4.1 (pNFS) and 9P Git HEAD = > >> 0f55a9a97a4bf232fb0e42542e4ca7491fbf84ce > >> Git Describe = V2.4.0.3-0-g0f55a9a > >> > >> # ceph -v > >> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > >> > >> # cat ganesha.conf > >> LOG { > >> components { > >> ALL = INFO; > >> } > >> } > >> > >> EXPORT_DEFAULTS { > >> SecType = none, sys; > >> Protocols = 3, 4; > >> Transports = TCP; > >> } > >> > >> # define CephFS export > >> EXPORT { > >> Export_ID = 42; > >> Path = /top; > >> Pseudo = /top; > >> Access_Type = RW; > >> Squash = No_Root_Squash; > >> FSAL { > >> Name = CEPH; > >> } > >> } > >> > >> The VFS export for the ext4 tests was: > >> > >> # define CephFS export > >> EXPORT { > >> Export_ID = 43; > >> Path = /var/top; > >> Pseudo = /var/top; > >> Access_Type = RW; > >> Squash = No_Root_Squash; > >> FSAL { > >> Name = VFS; > >> } > >> } > >> > >> The test was 2 Ubuntu 14.04 NFS clients each having 6 processes, > >> writing 11,000 256k files in separate directory trees with 11 files > >> per lowest level node. On each Ubuntu client, 3 processes wrote to a > >> NFS 3 mount and 3 wrote to a NFS 4 mount. The files are then read and > >> verified, deleted, and the test restarts. > >> > >> Regards, > >> Eric > > > > ---------------------------------------------------------------------- > > -------- The Command Line: Reinvented for Modern Developers Did the > > resurgence of CLI tooling catch you by surprise? > > Reconnect with the command line and become more productive. > > Learn the new .NET and ASP.NET CLI. Get your free copy! > > http://sdm.link/telerik > > _______________________________________________ > > Nfs-ganesha-devel mailing list > > Nfs-ganesha-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > > ---------------------------------------------------------------------------- -- > The Command Line: Reinvented for Modern Developers Did the resurgence > of CLI tooling catch you by surprise? > Reconnect with the command line and become more productive. > Learn the new .NET and ASP.NET CLI. Get your free copy! > http://sdm.link/telerik > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel