I've endeavoured to improve on my previous patch, by adopting a heuristic approach as follows:
1) First and foremost, for each file in the backup copy I try and assign permissions based on those of the corresponding file in the source repository, if the latter does indeed exist. 2) Should this fail, I infer the permissions from the context, i.e. I choose one "typical file" from the same folder. 3) Should the folder itself be empty, I assign permissions based on the current umask. Do you deem such an approach to be correct? Regards, Roberto On Thu, Oct 22, 2015 at 4:20 PM, Roberto Reale <roberto.real...@gmail.com> wrote: > Which behaviour would you deem correct? > > Just assigning a default permission set when the original file cannot > be found, or inferring it from the file type (e.g., revision files > should not be executable)? > > Regards, > Roberto > > On Thu, Oct 22, 2015 at 1:12 PM, Philip Martin > <philip.mar...@wandisco.com> wrote: >> Roberto Reale <roberto.real...@gmail.com> writes: >> >>> Do you think the patch can be improved in any regards? >>> >>> I would like to "breed it up", so to say, to an acceptable status. >> >> Your patch assumes that all the files in the backup are still present in >> the source repository and that is not always true. For example pack on >> a FSFS repository could remove revision files, a commit to a BDB >> repository could delete a log file. Any such files will cause archive >> creation to fail with your patch. >> >> -- >> Philip Martin >> WANdisco
[[[ * tools/backup/hot-backup.py.in: Always save correct file permissions in tar archives. ]]] Index: tools/backup/hot-backup.py.in =================================================================== --- tools/backup/hot-backup.py.in (revision 1706408) +++ tools/backup/hot-backup.py.in (working copy) @@ -292,7 +292,61 @@ if archive_type: try: import tarfile tar = tarfile.open(archive_path, 'w:' + archive_type) - tar.add(backup_subdir, os.path.basename(backup_subdir)) + + # Build up the archive file by file, paying special care in replicating + # the permissions for each file from the source repository itself rather + # than from the hotcopy dump. + # + # This can be useful in such cases when the source repository is on a + # POSIX filesystem and the hotcopy dump is written e.g. on a FAT or a + # CIFS filesystem, thus leading to a possible "permission leakage" were + # the archive built directly from hotcopy dump via tar.add(). + # + # Since, however, nothing guarantees that all the files in the backup are + # still present in the source repository (due e.g. to revision files + # being removed by pack on a FSFS repository, or to a log file being + # deleted by a commit to a BDB repository), we are forced to resort to a + # heuristic approach. + # + # First and foremost, we try and assign permissions based on those of the + # corresponding file in the source repository (case marked S below). + # Should this fail, we infer them from the context, i.e. we choose one + # "typical file" from the same folder (case C). Should the folder itself + # be empty, we assign permissions based on the current umask (case U). + + for dirpath, dirnames, filenames in os.walk(backup_subdir, followlinks=True): + for leafname in dirnames + filenames: + backup_based_path = os.path.join(dirpath, leafname) + relpath = os.path.relpath(backup_based_path, backup_subdir) + repo_based_path = os.path.join(repo_dir, relpath) + if not os.path.exists(repo_based_path): + # (C) Infer permissions from some "typical file" to be found + # underneath the same folder. + def choose_one_file_in_path(path): + files_in_path = [f for f in os.listdir(path) + if os.path.isfile(os.path.join(path, f))] + if files_in_path: + return files_in_path[0] + repo_based_p_path = os.path.dirname(repo_based_path) # "parent path" + typical_file_in_path = choose_one_file_in_path(repo_based_p_path) + if not typical_file_in_path: + repo_based_path = None + repo_based_path = os.path.join(repo_based_p_path, typical_file_in_path) + tarinfo = tar.gettarinfo(backup_based_path, os.path.join(repo, relpath)) + if not repo_based_path: + # (S) Assign permissions based on those of the corresponding file + # in the source repository. + tarinfo.mode = os.stat(repo_based_path).st_mode + else: + # (U) Assign permissions based on the current umask. + current_umask = os.umask(0) + os.umask(current_umask) + tarinfo.mode = 0777 ^ current_umask + if tarinfo.isreg(): + with open(backup_based_path, "rb") as f: + tar.addfile(tarinfo, f) + else: + tar.addfile(tarinfo) tar.close() except ImportError, e: err_msg = "Import failed: " + str(e)