Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Simon Thompson Wed, 10 Jun 2020 00:33:48 -0700

Quota … I thought there was a work around for this.

I think it went along the lines of.


Set the soft quota to what you want.
Set the hard quota 150% more.
Set the grace period to 1 second.

I think the issue is that when you are over soft quota, each operation has to 
queisce each time until you hit hard/grace period. Whereas once you hit grace, 
it no longer does this. I was just looking for the slide deck about this, but 
can’t find it at the moment! Tomer spoke about it at one point.

Simon

From: <[email protected]> on behalf of 
"[email protected]" <[email protected]>
Reply to: "[email protected]" <[email protected]>
Date: Wednesday, 10 June 2020 at 02:16
To: "[email protected]" <[email protected]>
Subject: Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of 
files

Lohit,

I did this while working @ NASA. I had two tools I used, one affectionately 
known as "luke file walker" (to modify traditional unix permissions) and the 
other known as the "milleniumfacl" (to modify posix ACLs). Stupid jokes aside, 
there were some real technical challenges here.

I don't know if anyone from the NCCS team at NASA is on the list, but if they 
are perhaps they'll jump in if they're willing to share the code :)

From what I recall, I used uthash and the gpfs API's to store in-memory a hash 
of inodes and their uid/gid information. I then walked the filesystem using the 
gpfs API's and could lookup the given inode in the in-memory hash to view its 
ownership details. Both the inode traversal and directory walk were 
parallelized/threaded. They way I actually executed the chown was particularly 
security-minded. There is a race condition that exists if you chown 
/path/to/file. All it takes is either a malicious user or someone monkeying 
around with the filesystem while it's live to accidentally chown the wrong file 
if a symbolic link ends up in the file path. My work around was to use openat() 
and fchmod (I think that was it, I played with this quite a bit to get it 
right) and for every path to be chown'd I would walk the hierarchy, opening 
each component with the O_NOFOLLOW flags to be sure I didn't accidentally 
stumble across a symlink in the way. I also implemented caching of open path 
component file descriptors since odds are I would be chowning/chgrp'ing files 
in the same directory. That bought me some speed up.

I opened up RFE's at one point, I believe, for gpfs API calls to do this type 
of operation. I would ideally have liked a mechanism to do this based on inode 
number rather than path which would help avoid issues of race conditions.

One of the gotchas to be aware of, is quotas. My wrapper script would clone 
quotas from the old uid to the new uid. That's easy enough. However, keep in 
mind, if the uid is over their quota your chown operation will absolutely kill 
your cluster. Once a user is over their quota the filesystem seems to want to 
quiesce all of its accounting information on every filesystem operation for 
that user. I would check for adequate quota headroom for the user in question 
and abort if there wasn't enough.

The ACL changes were much more tricky. There's no way, of which I'm aware, to 
atomically update ACL entries. You run the risk that you could clobber a user's 
ACL update if it occurs in the milliseconds between you reading the ACL and 
updating it as part of the UID/GID update. Thankfully we were using Posix ACLs 
which were easier for me to deal with programmatically. I still had the 
security concern over symbolic links appearing in paths to have their ACLs 
updated either maliciously or organically. I was able to deal with that by 
modifying libacl to implement ACL calls that used variants of xattr calls that 
took file descriptors as arguments and allowed me to throw nofollow flags. That 
code is here (
https://github.com/aaronknister/acl/commits/nofollow). I couldn't take 
advantage of the GPFS API's here to meet my requirements, so I just walked the 
filesystem tree in parallel if I recall correctly, retrieved every ACL and 
updated if necessary.

If you're using NFS4 ACLs... I don't have an easy answer for you :)

We did manage to migrate UID numbers for several hundred users and half a 
billion inodes in a relatively small amount of time with the filesystem active. 
Some of the concerns about symbolic links can be mitigated if there are no 
users active on the filesystem while the migration is underway.

-Aaron

On Mon, Jun 8, 2020 at 2:01 PM Lohit Valleru 
<[email protected]<mailto:[email protected]>> wrote:
Hello Everyone,

We are planning to migrate from LDAP to AD, and one of the best solution was to 
change the uidNumber and gidNumber to what SSSD or Centrify would resolve.

May I know, if anyone has come across a tool/tools that can change the 
uidNumbers and gidNumbers of billions of files efficiently and in a reliable 
manner?
We could spend some time to write a custom script, but wanted to know if a tool 
already exists.

Please do let me know, if any one else has come across a similar situation, and 
the steps/tools used to resolve the same.

Regards,
Lohit
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Change uidNumber and gidNumber for billions of files

Reply via email to