I've endeavoured to improve on my previous patch, by adopting a
heuristic approach as follows:

1) First and foremost, for each file in the backup copy I try and
assign permissions based on those of the corresponding file in the
source repository, if the latter does indeed exist.
2) Should this fail, I infer the permissions from the context, i.e. I
choose one "typical file" from the same folder.
3) Should the folder itself be empty, I assign permissions based on
the current umask.

Do you deem such an approach to be correct?

Regards,
Roberto

On Thu, Oct 22, 2015 at 4:20 PM, Roberto Reale
<roberto.real...@gmail.com> wrote:
> Which behaviour would you deem correct?
>
> Just assigning a default permission set when the original file cannot
> be found, or inferring it from the file type (e.g., revision files
> should not be executable)?
>
> Regards,
> Roberto
>
> On Thu, Oct 22, 2015 at 1:12 PM, Philip Martin
> <philip.mar...@wandisco.com> wrote:
>> Roberto Reale <roberto.real...@gmail.com> writes:
>>
>>> Do you think the patch can be improved in any regards?
>>>
>>> I would like to "breed it up", so to say, to an acceptable status.
>>
>> Your patch assumes that all the files in the backup are still present in
>> the source repository and that is not always true.  For example pack on
>> a FSFS repository could remove revision files, a commit to a BDB
>> repository could delete a log file.  Any such files will cause archive
>> creation to fail with your patch.
>>
>> --
>> Philip Martin
>> WANdisco
[[[

* tools/backup/hot-backup.py.in: Always save correct file
  permissions in tar archives.

]]]


Index: tools/backup/hot-backup.py.in
===================================================================
--- tools/backup/hot-backup.py.in       (revision 1706408)
+++ tools/backup/hot-backup.py.in       (working copy)
@@ -292,7 +292,61 @@ if archive_type:
     try:
       import tarfile
       tar = tarfile.open(archive_path, 'w:' + archive_type)
-      tar.add(backup_subdir, os.path.basename(backup_subdir))
+
+      # Build up the archive file by file, paying special care in replicating
+      # the permissions for each file from the source repository itself rather
+      # than from the hotcopy dump.
+      #
+      # This can be useful in such cases when the source repository is on a
+      # POSIX filesystem and the hotcopy dump is written e.g. on a FAT or a
+      # CIFS filesystem, thus leading to a possible "permission leakage" were
+      # the archive built directly from hotcopy dump via tar.add().
+      #
+      # Since, however, nothing guarantees that all the files in the backup are
+      # still present in the source repository (due e.g. to revision files
+      # being removed by pack on a FSFS repository, or to a log file being
+      # deleted by a commit to a BDB repository), we are forced to resort to a
+      # heuristic approach.
+      #
+      # First and foremost, we try and assign permissions based on those of the
+      # corresponding file in the source repository (case marked S below).
+      # Should this fail, we infer them from the context, i.e. we choose one
+      # "typical file" from the same folder (case C).  Should the folder itself
+      # be empty, we assign permissions based on the current umask (case U).
+
+      for dirpath, dirnames, filenames in os.walk(backup_subdir, 
followlinks=True):
+        for leafname in dirnames + filenames:
+          backup_based_path = os.path.join(dirpath, leafname)
+          relpath = os.path.relpath(backup_based_path, backup_subdir)
+          repo_based_path = os.path.join(repo_dir, relpath)
+          if not os.path.exists(repo_based_path):
+            # (C) Infer permissions from some "typical file" to be found
+            #     underneath the same folder.
+            def choose_one_file_in_path(path):
+              files_in_path = [f for f in os.listdir(path)
+                               if os.path.isfile(os.path.join(path, f))]
+              if files_in_path:
+                return files_in_path[0]
+            repo_based_p_path = os.path.dirname(repo_based_path)  # "parent 
path"
+            typical_file_in_path = choose_one_file_in_path(repo_based_p_path)
+            if not typical_file_in_path:
+              repo_based_path = None
+            repo_based_path = os.path.join(repo_based_p_path, 
typical_file_in_path)
+          tarinfo = tar.gettarinfo(backup_based_path, os.path.join(repo, 
relpath))
+          if not repo_based_path:
+            # (S) Assign permissions based on those of the corresponding file
+            #     in the source repository.
+            tarinfo.mode = os.stat(repo_based_path).st_mode
+          else:
+            # (U) Assign permissions based on the current umask.
+            current_umask = os.umask(0)
+            os.umask(current_umask)
+            tarinfo.mode = 0777 ^ current_umask
+          if tarinfo.isreg():
+            with open(backup_based_path, "rb") as f:
+              tar.addfile(tarinfo, f)
+          else:
+            tar.addfile(tarinfo)
       tar.close()
     except ImportError, e:
       err_msg = "Import failed: " + str(e)

Reply via email to