Re: [gpfsug-discuss] Mass UID migration suggestions

Aaron Knister Wed, 02 Aug 2017 18:04:15 -0700

Oh, the one *huge* gotcha I thought I'd share-- we wrote a perl scriptto drive the migration and part of the perl script's process was toclone quotas from old uid numbers to the new number. I upset our GPFScluster during a particular migration in which the user was over thegrace period of the quota so after a certain point every chown() put thedestination UID even further over its quota. The problem with this beingthat at this point every chown() operation would cause GPFS to do somecluster-wide quota accounting-related RPCs. That hurt. It's worth makingsure there are no quotas defined for the destination UID numbers and ifthey are that the data coming from the source UID number will fit.


-Aaron


On 8/2/17 9:00 PM, Aaron Knister wrote:

I'm a little late to the party here but I thought I'd share our recentexperiences.
We recently completed a mass UID number migration (half a billioninodes) and developed two tools ("luke filewalker" and the"mmilleniumfacl") to get the job done. Both luke filewalker and themmilleniumfacl are based heavily on the code in/usr/lpp/mmfs/samples/util/tsreaddir.c and/usr/lpp/mmfs/samples/util/tsinode.c.
luke filewalker targets traditional POSIX permissions whereasmmilleniumfacl targets posix ACLs. Both tools traverse the filesystem inparallel and both but particularly the 2nd, are extremely I/O intensiveon your metadata disks.
The gist of luke filewalker is to scan the inode structures using thegpfs APIs and populate a mapping of inode number to gid and uid number.It then walks the filesystem in parallel using the APIs, looks up theinode number in an in-memory hash, and if appropriate changes ownershipusing the chown() API.
The mmilleniumfacl doesn't have the luxury of scanning for POSIX ACLsusing the GPFS inode API so it walks the filesystem and reads the ACL ofany and every file, updating the ACL entries as appropriate.
I'm going to see if I can share the source code for both tools, althoughI don't know if I can post it here since it modified existing IBM sourcecode. Could someone from IBM chime in here? If I were to send the codeto IBM could they publish it perhaps on the wiki?
-Aaron

On 6/30/17 11:20 AM, [email protected] wrote:
Hello,
We're trying to change most of our users uids, is there a cleanway tomigrate all of one users files with say `mmapplypolicy`? We have tochange theowner of around 273539588 files, and my estimates for runtime arearound 6 days.
What we've been doing is indexing all of the files and splittingthem up byowner which takes around an hour, and then we were locking the userout while wechown their files. I made it multi threaded as it weirdly gave a 10%speedupdespite my expectation that multi threading access from a single nodewould not
give any speedup.
Generally I'm looking for advice on how to make the chowningfaster. Wouldspreading the chowning processes over multiple nodes improveperformance? ShouldI not stat the files before running lchown on them, since lchownchecks the filebefore changing it? I saw mention of inodescan(), in an old gpfsugemail, whichspeeds up disk read access, by not guaranteeing that the data is up todate. Wehave a maintenance day coming up where all users will be locked out,so the filehandles(?) from GPFS's perspective will not be able to go stale. Isthere afunction with similar constraints to inodescan that I can use to speedup this
process?

Thank you for your time,

Luke
Storrs-HPC
University of Connecticut
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Mass UID migration suggestions

Reply via email to