While testing Ganesha NFS V2.4.0.3 using the CEPH FSAL to a ceph file
system, I am seeing the ganesha.nfsd process die due to an assert call
multiple times per hour.  I have also seen it die at the same place in
the code using the VFS FSAL with a ext4 file system, but it dies much
less often.

It is dying at line 917 in src/SAL/state_misc.c, which is called by
src/SAL/state_misc.c at line 1010.  The assert call is in
dec_state_owner_ref() at the line:

       assert(refcount > 0);

Looking at the core files and adding in some debugging code confirms
that refcount is -1 when the assert call is made.

It looks like the owner count is trying to go to -1 in
uncache_nfs4_owner(), but as it occurs only on occasions, I think it
is a race condition.

Info on the build:

Host OS is Ubuntu 14.04 with a 4.8.2 x86_64 kernel on a 8 processor system

Cmake command:
# cmake -DCMAKE_INSTALL_PREFIX=/opt/keeper -DALLOCATOR=jemalloc
-DUSE_ADMIN_TOOLS=ON -DUSE_DBUS=ON ../src

# ganesha.nfsd -v
ganesha.nfsd compiled on Oct 17 2016 at 16:50:18
Release = V2.4.0.3
Release comment = GANESHA file server is 64 bits compliant and
supports NFS v3,4.0,4.1 (pNFS) and 9P
Git HEAD = 0f55a9a97a4bf232fb0e42542e4ca7491fbf84ce
Git Describe = V2.4.0.3-0-g0f55a9a

# ceph -v
ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)

# cat ganesha.conf
LOG {
    components {
       ALL = INFO;
    }
}

EXPORT_DEFAULTS {
SecType = none, sys;
Protocols = 3, 4;
Transports = TCP;
}

# define CephFS export
EXPORT {
    Export_ID = 42;
    Path = /top;
    Pseudo = /top;
    Access_Type = RW;
    Squash = No_Root_Squash;
    FSAL {
        Name = CEPH;
    }
}

The VFS export for the ext4 tests was:

# define CephFS export
EXPORT {
    Export_ID = 43;
    Path = /var/top;
    Pseudo = /var/top;
    Access_Type = RW;
    Squash = No_Root_Squash;
    FSAL {
        Name = VFS;
    }
}

The test was 2 Ubuntu 14.04 NFS clients each having 6 processes,
writing 11,000 256k files in separate directory trees with 11 files
per lowest level node. On each Ubuntu client, 3 processes wrote to a
NFS 3 mount and 3 wrote to a NFS 4 mount. The files are then read and
verified, deleted, and the test restarts.

Regards,
Eric

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to