Hi Andreas,

   Indeed, we’re running 2.10.7 from DDN. Unfortunately, updating to more 
recent Lustre versions might be a hard sell to our admins without a solid case. 
It’s an odd problem, since we haven’t been able to track down a correlation of 
the behavior with anything—it only affects certain paths and users as far as we 
can tell.
   Our primary goal wasn’t the getstripe, it was just a convenient universally 
available proxy applet that exhibited the problematic behavior. The actual 
problem in production is that users are unable to access certain files or 
directories programmatically that they should be able to given the group 
permissions, and are getting permission denied errors. For example, opening a 
file from a Python script; running an os.stat before the read attempt sidesteps 
the issue in some cases (as “ls” does from bash). If it were a localized bug in 
getstripe that would be great, but it seems like a somewhat deeper problem in 
our Lustre instance.
   The directory doesn’t have world permissions—we’re using Lustre to host 
project data, and so everything is generally o-rwx permitted without specific 
reasons. stat shows


     File: 
‘/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min’

  Size: 32768          Blocks: 64         IO Block: 4096   directory

Device: 83d08f32h/2211483442d      Inode: 162132535332135820  Links: 2

Access: (0770/drwxrwx---)  Uid: (121756/***)   Gid: (130817/   naris)

Context: unconfined_u:object_r:unlabeled_t:s0

Access: 2020-07-02 10:16:52.000000000 -0600

Modify: 2020-04-10 19:40:39.000000000 -0600

Change: 2020-05-14 09:50:37.000000000 -0600

 Birth: -

Could you point me to the Jira site?

Thanks again for all your help.

Chris

From: Andreas Dilger <adil...@whamcloud.com>
Date: Thursday, July 2, 2020 at 3:25 PM
To: Christopher Chang <christopher.ch...@nrel.gov>
Cc: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org>, 
"Kaiser, Timothy" <timothy.kai...@nrel.gov>
Subject: Re: [lustre-discuss] Permission denied on lfs getstripe

Chris,
this looks like a bug that "lfs getstripe -M" is not using supplementary 
groups, or similar.  You wrote that the directory has GID=130817, so this is 
not the primary GID of the user accessing it, so it must depend on the 
supplementary group permissions to access it.  The "regular" ls access *is* 
using the supplementary GID to allow access, and when the directory is cached 
on the client then "lfs getstripe -M" is getting this information out of the 
client-side cache (where the client VFS is locally checking the GID for access 
permission).

I suspect this hasn't really been an issue in the past because few users use 
"lfs getstripe -M", and most of those are root or are accessing their own 
files/directories, so do not need a supplementary group to access this 
information.  It also seems (but isn't shown) that the directory does not have 
world-read permission?  What does "stat" on this directory show?

Could you please file a ticket in Jira with the details so that this issue can 
be tracked.  I don't know how easy/hard it will be to fix this, since this 
information is obtained via ioctl(), and we don't necessarily want non-owners 
of files to be able to call every ioctl on the file/directory.

Note, it is recommended to use "lfs getdirstripe --m" (or "--mdt-index") 
instead of "-M" to get the MDT index of a file, since the "-M" option is 
deprecated to  This would imply you are running a Lustre 2.10 client?  The "-m" 
option is already available in 2.10, and "-M" will print a warning in 2.12 and 
later.

I tested this on master and was not able to reproduce the problem.  If I set 
the directory mode=0640 I got permission denied for directories that I didn't 
have supplementary group access on, but it worked on the first try (after 
flushing all client locks and dropping all caches).  That means the problem 
seems to already be fixed in master, and possibly 2.12 also.

Cheers, Andreas


On Jul 2, 2020, at 10:26, Chang, Christopher 
<christopher.ch...@nrel.gov<mailto:christopher.ch...@nrel.gov>> wrote:

Hi Andreas,

   It doesn’t appear to be this issue. I verified the client “id” and server 
“l_getidentity -d” views before and after issuing an “ls” as the user to get 
getstripe working, and there’s no change.

Client:
el3:~> id
uid=131364(***) gid=131364(***) 
groups=131364(***),130033(globus-access),130774(eagle-users),130808(ewer),130817(naris),131016(esp-wps-inputs),131178(lex-access),131237(naermpcm),249837(aces),249945(hpcapps),249996(n-apps)
 context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

el3:~> lfs getstripe -M 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
error opening 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min: 
Permission denied (13)
…
el3:~> ls 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
~Model ( c_RT5min_...

el3:~> lfs getstripe -M 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
1

el3:~> id
uid=131364(***) gid=131364(***) 
groups=131364(***),130033(globus-access),130774(eagle-users),130808(ewer),130817(naris),131016(esp-wps-inputs),131178(lex-access),131237(naermpcm),249837(aces),249945(hpcapps),249996(n-apps)
 context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Server:
[root@mds02 ~]# l_getidentity -d 131364
uid=131364 gid=131364,130808,130817,131016,131237,249837,249945,249996
permissions:
  nid             perm
(client does an ls)
[root@mds02 ~]# l_getidentity -d 131364
uid=131364 gid=131364,130808,130817,131016,131237,249837,249945,249996
permissions:
  nid             perm

The relevant gid for the target directory is 130817. I verified that all 3 of 
our MDSs had the same view before and after the “ls”.

Thanks; Chris

From: Andreas Dilger <adil...@whamcloud.com<mailto:adil...@whamcloud.com>>
Date: Sunday, June 28, 2020 at 5:11 PM
To: Christopher Chang 
<christopher.ch...@nrel.gov<mailto:christopher.ch...@nrel.gov>>
Cc: "lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>>, 
"Kaiser, Timothy" <timothy.kai...@nrel.gov<mailto:timothy.kai...@nrel.gov>>
Subject: Re: [lustre-discuss] Permission denied on lfs getstripe

On Jun 26, 2020, at 10:45, Chang, Christopher 
<christopher.ch...@nrel.gov<mailto:christopher.ch...@nrel.gov>> wrote:

Hi,

   We’re running into an error with a particular directory. It is weird because 
it can be resolved in an unexpected way, but only for a time.
The error manifests as:

el3:out> lfs getstripe -M 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
error opening 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min: 
Permission denied (13)
llapi_semantic_traverse: Failed to open 
'/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min': 
Permission denied (13)
error: getstripe failed for 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min.

The temporary resolution is:
el3:out> ls 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
~Model ( c_RT5min_TechBreak2050_092P_OLd000_001 ) Log.txt  Model 
c_RT5min_TechBreak2050_092P_OLd000_033 Solution.h5   Model 
c_RT5min_TechBreak2050_092P_OLd000_062 Solution.h5
…

Then
el3:out> lfs getstripe -M 
/projects/naris/pcm_110819/NARIS_TechBreak2050_missingDPV/StageC_RT5min
1
el3:out>

It looks like the user might only have supplementary group access to this file? 
 You could check on the client by running "id" to list the primary user ID and 
supplementary groups, then "ls -ln" on the file to see what group it is owned 
by.

If that is the case, it would indicate that the MDS /etc/group (or other source 
of supplementary group information, like NIS or LDAP, via /etc/nsswitch.conf) 
is not up-to-date with what is on the clients, or you have 
mdt.*.identity_upcall=NONE on the MDS instead of =l_getidentity.  You can test 
what l_getidentity on the MDS thinks the supplementary groups are for a 
particular user by running "l_getidentity -d <uid>" to compare what "id" 
returns on the client.

Cheers, Andreas




However, the getstripe command will only continue to work for about 10 minutes, 
then it goes back to the permission denied errors.
It only happens with a selection of files or directories, so we were thinking 
it might be connected to a particular OSS or MDT, but not sure what to look for.

I am not the Lustre admin, so please forgive incomplete information. If folks 
can request specific command output, preferably from user space, that would 
accelerate my ability to answer questions. If something needs to get run while 
logged into a particular Lustre component (MDT, OSS, etc.), please do not 
hesitate to assume that I don’t know that.

We’re running Lustre 2.10.7 provided by DDN on CentOS 7.4. All help 
appreciated, thanks!

Chris

--
Christopher H. Chang, Ph.D.
Computational Scientist
National Renewable Energy Laboratory
15013 Denver West Pkwy., MS ESIF301
Golden, CO 80401


_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<https://gcc01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7CChristopher.Chang%40nrel.gov%7C23cdf09818b04f7c4b2908d81ece7647%7Ca0f29d7e28cd4f5484427885aee7c080%7C0%7C0%7C637293219405208192&sdata=X0Kby6jlmWd4HE3A9%2FFDucYMuzdOlNxNoSTgG7MP8B4%3D&reserved=0>

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud





_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to