On 23.11.2018 14:41, Andreas Mattsson wrote:
Yes, this is repeating.
We’ve ascertained that it has nothing to do at all with file operations on the
GPFS side.
Randomly throughout the filesystem mounted via NFS, ls or file access will give
”
> ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument
“
Trying again later might work on that folder, but might fail somewhere else.
We have tried exporting the same filesystem via a standard kernel NFS instead of the CES
Ganesha-NFS, and then the problem doesn’t exist.
So it is definitely related to the Ganesha NFS server, or its interaction with
the file system.
> Will see if I can get a tcpdump of the issue.
We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with
debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is
the culprit.
Here some FULL_DEBUG output:
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw,
3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw,
3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match
:EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw,
3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get
:DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , ,
, , -- Deleg, , )
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---,
TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP,
TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250]
export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---,
TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys)
2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute
:DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport,
vers=3, proc=18
The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for
"netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is
NOT a member of "netgroup1".
I have also opened a support case at IBM for this.
@Malahal: Looks like you have written the netgroup caching code, feel free to ask for further
details if required.
Kind regards,
Ulrich Sibiller
--
Dipl.-Inf. Ulrich Sibiller science + computing ag
System Administration Hagellocher Weg 73
72070 Tuebingen, Germany
https://atos.net/de/deutschland/sc
--
Science + Computing AG
Vorstandsvorsitzender/Chairman of the board of management:
Dr. Martin Matzke
Vorstand/Board of Management:
Matthias Schempp, Sabine Hohenstein
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Aufsichtsrat/Supervisory Board:
Martin Wibbe, Ursula Morgenstern
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss