I think you found your answer: TSM tracks files by pathname. So... if a file had path /w/x/y/z on Monday. But was moved to /w/x/q/p on Tuesday, how would TSM "know" it was the same file...? It wouldn't! To TSM it seems you've deleted the first and created the second.
Technically there are some other possibilities, and some backup systems may use them, but NOT TSM: 1) Record the inode number and generation number and/or creation timestamp. Within a given Posix-ish file system, that uniquely identifies the file. 2) Record a strong (cryptographic quality) checksum (hash) of the contents of the file. If two files have the same checksum (hash) then the odds are we can use the same backup data for both and don't have to make an extra copy in the backup system. To make the odds really, really "long" you want to take into account the "birthday paradox" and use lots and lots of bits. Long odds can also be compared to the probability of losing a file due to a bug or an IO error or accident or disaster... For example SHA-256, might be strong and long enough for you to believe in. Backup is not generally a cryptographic game, so perhaps you should not worry much about some evil doer purposely trying to confound your backup system. OTOH - if you have users who are adversaries, all backing up into the same system... In theory one might "destroy" another's backup. This save transmission and storage of duplicates, but of course the backup system has to read the contents of each suspected new file and compute the hash...
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
