On 08/06/2020 18:44, Lohit Valleru wrote:
Hello Everyone,

We are planning to migrate from LDAP to AD, and one of the best solution was to change the uidNumber and gidNumber to what SSSD or Centrify would resolve.

May I know, if anyone has come across a tool/tools that can change the uidNumbers and gidNumbers of billions of files efficiently and in a reliable manner?

Not to my knowledge.

We could spend some time to write a custom script, but wanted to know if a tool already exists.


If you can be sure that all files under a specific directory belong to a specific user and you have no ACL's then a whole bunch of "chown -R" would be reasonable. That is you have a lot of user home directories for example.

What I do in these scenarios is use a small sqlite database, say in this scenario which has the directory that I want to chown on, the target UID and GID and a status field. Initially I set the status field to -1 which indicates they have not been processed. The script sets the status field to -2 when it starts processing an entry and on completion sets the status field to the exit code of the command you are running. This way when the script is finished you can see any directory hierarchies that had a problem and if it dies early you can see where it got up to (that -2).

You can also do things like set all none zero status codes back to -1 and run again with a simple SQL update on the database from the sqlite CLI.

If you don't need to modify ACL's but have mixed ownership under directory hierarchies then a script is reasonable but not a shell script. The overhead of execing chown billions of times on individual files will be astronomical. You need something like Perl or Python and make use of the builtin chown facilities of the language to avoid all those exec's. That said I suspect you will see a significant speed up from using C.

If you have ACL's to contend with then I would definitely spend some time and write some C code using the GPFS library. It will be a *LOT* faster than any script ever will be. Dealing with mmpgetacl and mmputacl in any script is horrendous and you will have billions of exec's of each command.

As I understand it GPFS stores each ACL once and each file then points to the ACL. Theoretically it would be possible to just modify the stored ACL's for a very speedy update of all the ACL's on the files/directories. However I would imagine you need to engage IBM and bend over while they empty your wallet for that option :-)

The biggest issue to take care of IMHO is do any of the input UID/GID numbers exist in the output set??? If so life just got a lot harder as you don't get a second chance to run the script/program if there is a problem.

In this case I would be very tempted to remove such clashes prior to the main change. You might be able to do that incrementally before the main switch and update your LDAP in to match.

Finally be aware that if you are using TSM for backup you will probably need to back every file up again after the change of ownership as far as I am aware.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to