I re-ran the same test for 48 hours using the NFS 4.0 mount option, to
the Ganesha NFS 2.4.1 server, with the client NFS fstab entry:

ede-c2-gw01:/var/top /C2-NFS4 nfs4 rw,hard,noauto,vers=4.0  0 0

and I have not seen any assert or segfaults, so there something going
on when using vers=4.2 that is not seen with vers=4.0. When using
vers=4.2, I normally see more then 20 asserts or segfault per 24 hours
when running my test case.

I am going to re-run my tests using vers=4.1

Eric


On Wed, Nov 2, 2016 at 12:20 PM, Frank Filz <ffilz...@mindspring.com> wrote:
> I'm playing with running Ganesha under valgrind and helgrind to see if
> anything drops out from those.
>
> Unfortunately helgrind seems to show up a lot of data races that either have
> no functional impact (stat collection that doesn't use atomic ops), a ton in
> the ntirpc code, and it also seems to misunderstand some atomic ops (I HAVE
> seen it complain before when something is accessed using atomic ops, but
> sometimes while holding a lock, and sometimes not, it decides the fact that
> there were unlocked accesses causes a race even though the atomic op should
> guarantee).
>
> Frank
>
>> -----Original Message-----
>> From: Malahal Naineni [mailto:mala...@gmail.com]
>> Sent: Tuesday, October 25, 2016 11:22 PM
>> To: Eric Eastman <eric.east...@keepertech.com>
>> Cc: nfs-ganesha-devel@lists.sourceforge.net
>> Subject: Re: [Nfs-ganesha-devel] assert in dec_state_owner_ref() with
>> V2.4.0.3
>>
>> Please post if you have an easy reproducer. We will try to recreate and
> root
>> cause it.
>>
>> On Wed, Oct 26, 2016 at 6:15 AM, Eric Eastman
>> <eric.east...@keepertech.com> wrote:
>> > A little more info on this issue.  I did a 24 hour run of my test
>> > using the POSIX FSAL with an ext4 file system as the backstore, and
>> > saw 9 asserts during this test run, all caused by the variable
>> > "refcount" ending up at -1.  The errors seem to be occurring while
>> > running "rm -rf" on a directory with 1000 sub-directories, with each
>> > having 11 files in it.
>> >
>> > This looks to me like a race condition and I am having issues finding
>> > the root cause reading through the source code.  There are notes from
>> > commit e7307c5, dated Jan 5 2016,  on "Resolve race between
>> > get_state_owner and dec_state_owner_ref differently"  so this looks
>> > like an area that there has been issues before.
>> >
>> > If anyone has an idea on what the root problem is or where to look,
>> > please let me know, as we cannot use Ganesha NFS if it is going to
>> > assert during production.
>> >
>> > Thanks,
>> > Eric
>> >
>> > On Thu, Oct 20, 2016 at 1:22 AM, Eric Eastman
>> > <eric.east...@keepertech.com> wrote:
>> >> While testing Ganesha NFS V2.4.0.3 using the CEPH FSAL to a ceph file
>> >> system, I am seeing the ganesha.nfsd process die due to an assert
>> >> call multiple times per hour.  I have also seen it die at the same
>> >> place in the code using the VFS FSAL with a ext4 file system, but it
>> >> dies much less often.
>> >>
>> >> It is dying at line 917 in src/SAL/state_misc.c, which is called by
>> >> src/SAL/state_misc.c at line 1010.  The assert call is in
>> >> dec_state_owner_ref() at the line:
>> >>
>> >>        assert(refcount > 0);
>> >>
>> >> Looking at the core files and adding in some debugging code confirms
>> >> that refcount is -1 when the assert call is made.
>> >>
>> >> It looks like the owner count is trying to go to -1 in
>> >> uncache_nfs4_owner(), but as it occurs only on occasions, I think it
>> >> is a race condition.
>> >>
>> >> Info on the build:
>> >>
>> >> Host OS is Ubuntu 14.04 with a 4.8.2 x86_64 kernel on a 8 processor
>> >> system
>> >>
>> >> Cmake command:
>> >> # cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper -
>> DALLOCATOR=jemalloc
>> >> -DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src
>> >>
>> >> # ganesha.nfsd -v
>> >> ganesha.nfsd compiled on Oct 17 2016 at 16:50:18 Release = V2.4.0.3
>> >> Release comment = GANESHA file server is 64 bits compliant and
>> >> supports NFS v3,4.0,4.1 (pNFS) and 9P Git HEAD =
>> >> 0f55a9a97a4bf232fb0e42542e4ca7491fbf84ce
>> >> Git Describe = V2.4.0.3-0-g0f55a9a
>> >>
>> >> # ceph -v
>> >> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> >>
>> >> # cat ganesha.conf
>> >> LOG {
>> >>     components {
>> >>        ALL = INFO;
>> >>     }
>> >> }
>> >>
>> >> EXPORT_DEFAULTS {
>> >> SecType = none, sys;
>> >> Protocols = 3, 4;
>> >> Transports = TCP;
>> >> }
>> >>
>> >> # define CephFS export
>> >> EXPORT {
>> >>     Export_ID = 42;
>> >>     Path = /top;
>> >>     Pseudo = /top;
>> >>     Access_Type = RW;
>> >>     Squash = No_Root_Squash;
>> >>     FSAL {
>> >>         Name = CEPH;
>> >>     }
>> >> }
>> >>
>> >> The VFS export for the ext4 tests was:
>> >>
>> >> # define CephFS export
>> >> EXPORT {
>> >>     Export_ID = 43;
>> >>     Path = /var/top;
>> >>     Pseudo = /var/top;
>> >>     Access_Type = RW;
>> >>     Squash = No_Root_Squash;
>> >>     FSAL {
>> >>         Name = VFS;
>> >>     }
>> >> }
>> >>
>> >> The test was 2 Ubuntu 14.04 NFS clients each having 6 processes,
>> >> writing 11,000 256k files in separate directory trees with 11 files
>> >> per lowest level node. On each Ubuntu client, 3 processes wrote to a
>> >> NFS 3 mount and 3 wrote to a NFS 4 mount. The files are then read and
>> >> verified, deleted, and the test restarts.
>> >>
>> >> Regards,
>> >> Eric
>> >
>> > ----------------------------------------------------------------------
>> > -------- The Command Line: Reinvented for Modern Developers Did the
>> > resurgence of CLI tooling catch you by surprise?
>> > Reconnect with the command line and become more productive.
>> > Learn the new .NET and ASP.NET CLI. Get your free copy!
>> > http://sdm.link/telerik
>> > _______________________________________________
>> > Nfs-ganesha-devel mailing list
>> > Nfs-ganesha-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>>
> ----------------------------------------------------------------------------
> --
>> The Command Line: Reinvented for Modern Developers Did the resurgence
>> of CLI tooling catch you by surprise?
>> Reconnect with the command line and become more productive.
>> Learn the new .NET and ASP.NET CLI. Get your free copy!
>> http://sdm.link/telerik
>> _______________________________________________
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to