This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new d5c0f32a87 Fix reproducibility of source-tarballs prepared as part of
release (#36819)
d5c0f32a87 is described below
commit d5c0f32a87775984d156f0a3827f457808db7afc
Author: Jarek Potiuk <[email protected]>
AuthorDate: Tue Jan 16 22:02:40 2024 +0100
Fix reproducibility of source-tarballs prepared as part of release (#36819)
The source-tarball that was supposed to be reproducible, was
"almost" reproducible. It turned out that we forgot that group
permissions are bound to change depending on the umask that is
configured in the system (because using umask is the default git
configuration when files are checked out). This means that the
packages were reproducible only if the two people who built it
had the same umask set.
Since it is unlikely that default owner umask is different than
rwx allowed, all the tools that are bound to provide reproducibility
approach it via setting the umask to clear any permissions for group
and other - this way the files in archive have only owner permissions
set.
This is what this PR does - we set the tar.umask when running the
`git archive` command to 0077 which effectively cleans all the group
and other permissions.
In this PR we also fix FutureDeprecation warning raised in newer
versions of Python where GzipFile should take keyword parameters
rather than positional ones.
---
dev/breeze/src/airflow_breeze/commands/release_candidate_command.py | 2 ++
dev/breeze/src/airflow_breeze/utils/reproducible.py | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git
a/dev/breeze/src/airflow_breeze/commands/release_candidate_command.py
b/dev/breeze/src/airflow_breeze/commands/release_candidate_command.py
index 11cba8d9f5..5408420a30 100644
--- a/dev/breeze/src/airflow_breeze/commands/release_candidate_command.py
+++ b/dev/breeze/src/airflow_breeze/commands/release_candidate_command.py
@@ -88,6 +88,8 @@ def tarball_release(version: str, version_without_rc: str,
source_date_epoch: in
run_command(
[
"git",
+ "-c",
+ "tar.umask=0077",
"archive",
"--format=tar.gz",
f"{version}",
diff --git a/dev/breeze/src/airflow_breeze/utils/reproducible.py
b/dev/breeze/src/airflow_breeze/utils/reproducible.py
index a85d871a3c..ca272ea9ec 100644
--- a/dev/breeze/src/airflow_breeze/utils/reproducible.py
+++ b/dev/breeze/src/airflow_breeze/utils/reproducible.py
@@ -100,7 +100,7 @@ def archive_deterministically(dir_to_archive, dest_archive,
prepend_path=None, t
# packaging (in case of exceptional situations like running out of
disk space).
temp_file = f"{dest_archive}.temp~"
with os.fdopen(os.open(temp_file, os.O_WRONLY | os.O_CREAT, 0o644),
"wb") as out_file:
- with gzip.GzipFile("wb", fileobj=out_file, mtime=0) as gzip_file:
+ with gzip.GzipFile(fileobj=out_file, mtime=0, mode="wb") as
gzip_file:
with tarfile.open(fileobj=gzip_file, mode="w:") as tar_file:
for entry in file_list:
arcname = entry