Just reporting back that the issue we had seems to have been solved. In our case it was fixed by applying hotfix-packages from IBM. Did this in December and I can no longer trigger the issue. Hopefully, it'll stay fixed when we get full production load on the system again now in January.
Also, as far as I can see, it looks like Scale 5.0.2.2 includes these packages already. Regards, Andreas mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 <mailto:andreas.matts...@maxiv.se>andreas.matts...@maxiv.lu.se<mailto:andreas.matts...@maxiv.lu.se> www.maxiv.se<http://www.maxiv.se/> ________________________________ Från: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> för Ulrich Sibiller <u.sibil...@science-computing.de> Skickat: den 13 december 2018 14:52:42 Till: gpfsug-discuss@spectrumscale.org Ämne: Re: [gpfsug-discuss] Filesystem access issues via CES NFS On 23.11.2018 14:41, Andreas Mattsson wrote: > Yes, this is repeating. > > We’ve ascertained that it has nothing to do at all with file operations on > the GPFS side. > > Randomly throughout the filesystem mounted via NFS, ls or file access will > give > > ” > > > ls: reading directory /gpfs/filessystem/test/testdir: Invalid argument > > “ > > Trying again later might work on that folder, but might fail somewhere else. > > We have tried exporting the same filesystem via a standard kernel NFS instead > of the CES > Ganesha-NFS, and then the problem doesn’t exist. > > So it is definitely related to the Ganesha NFS server, or its interaction > with the file system. > > Will see if I can get a tcpdump of the issue. We see this, too. We cannot trigger it. Fortunately I have managed to capture some logs with debugging enabled. I have now dug into the ganesha 2.5.3 code and I think the netgroup caching is the culprit. Here some FULL_DEBUG output: 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Check for address 1.2.3.4 for export id 1 path /gpfsexport 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcf7fe0 NETGROUP_CLIENT: netgroup1 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe320 NETGROUP_CLIENT: netgroup2 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] client_match :EXPORT :M_DBG :Match V4: 0xcfe380 NETGROUP_CLIENT: netgroup3 (options=421021e2root_squash , RWrw, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_ip_name_get :DISP :F_DBG :Cache get hit for 1.2.3.4->client1.domain 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT (options=03303002 , , , , , -- Deleg, , ) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , , anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :default options (options=03303002root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Deleg, anon_uid= -2, anon_gid= -2, none, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] export_check_access :EXPORT :M_DBG :Final options (options=42102002root_squash , ----, 3--, ---, TCP, ----, Manage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-12-13 11:53:41 : epoch 0009008d : server1 : gpfs.ganesha.nfsd-258762[work-250] nfs_rpc_execute :DISP :INFO :DISP: INFO: Client ::ffff:1.2.3.4 is not allowed to access Export_Id 1 /gpfsexport, vers=3, proc=18 The client "client1" is definitely a member of the "netgroup1". But the NETGROUP_CLIENT lookups for "netgroup2" and "netgroup3" can only happen if the netgroup caching code reports that "client1" is NOT a member of "netgroup1". I have also opened a support case at IBM for this. @Malahal: Looks like you have written the netgroup caching code, feel free to ask for further details if required. Kind regards, Ulrich Sibiller -- Dipl.-Inf. Ulrich Sibiller science + computing ag System Administration Hagellocher Weg 73 72070 Tuebingen, Germany https://atos.net/de/deutschland/sc -- Science + Computing AG Vorstandsvorsitzender/Chairman of the board of management: Dr. Martin Matzke Vorstand/Board of Management: Matthias Schempp, Sabine Hohenstein Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Aufsichtsrat/Supervisory Board: Martin Wibbe, Ursula Morgenstern Sitz/Registered Office: Tuebingen Registergericht/Registration Court: Stuttgart Registernummer/Commercial Register No.: HRB 382196 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss