This is an automated email from the ASF dual-hosted git repository.
slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil.git
The following commit(s) were added to refs/heads/main by this push:
new 3a8d6bab3 Improve reproduciblity for release candidate artifacts
3a8d6bab3 is described below
commit 3a8d6bab342091160b0d3ea72b00c63334b02216
Author: Steve Lawrence <[email protected]>
AuthorDate: Tue Feb 11 11:02:22 2025 -0500
Improve reproduciblity for release candidate artifacts
The SBT native packager plugin is used to build helper binaries for
release candidate. In some cases these binaries are difficult to check
for reproducibility due to metadata that is embedded in the files. This
modifies our SBT configurations where possible to remove as much
variance as possible.
For tar, this adds options (based on tar reproducibility documentation)
that sets things like userid's and modification times to consistent
values. Note that the --sort=name option does not work on the version of
tar available on GitHub Windows and MacOS systems, so we now only
generate the tar on Linux CI.
For rpm, this sets a number of macros (e.g buildhost) so that the
embedded values in the RPM are always the same regardless of the actual
environment properties, which can differ between systems. We also change
the shebang in the bash script to be more portable. Note that are still
some macros in RPM that cannot be controlled my %defines, so in general
a same or similar environment is needed for reproducible RPMs.
For msi, there is nothing more we can do. There are only a couple of
timestamps and UUID's that cannot be changed. msidiff is a useful tool
that shows these are the only differences.
Zip artifacts are already reproducible and do not need changes.
DAFFODIL-2971
---
.github/workflows/main.yml | 8 ++---
daffodil-cli/build.sbt | 57 +++++++++++++++++++++++++++++---
daffodil-cli/src/templates/bash-template | 2 +-
3 files changed, 57 insertions(+), 10 deletions(-)
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index d874dbd68..7d249db4d 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -184,12 +184,12 @@ jobs:
- name: Build Documentation
run: $SBT unidoc genTunablesDoc
- - name: Package Zip & Tar
- run: $SBT daffodil-cli/Universal/packageBin
daffodil-cli/Universal/packageZipTarball
+ - name: Package Zip
+ run: $SBT daffodil-cli/Universal/packageBin
- - name: Package RPM (Linux)
+ - name: Package RPM & Tar (Linux)
if: runner.os == 'Linux'
- run: $SBT daffodil-cli/Rpm/packageBin
+ run: $SBT daffodil-cli/Rpm/packageBin
daffodil-cli/Universal/packageZipTarball
############################################################
# Check
diff --git a/daffodil-cli/build.sbt b/daffodil-cli/build.sbt
index 77ebb1156..5d71d222e 100644
--- a/daffodil-cli/build.sbt
+++ b/daffodil-cli/build.sbt
@@ -30,6 +30,27 @@ Linux / packageName := executableScriptName.value
Rpm / packageName := "apache-" + executableScriptName.value
Windows / packageName := executableScriptName.value
+val optSourceDateEpoch = scala.util.Properties.envOrNone("SOURCE_DATE_EPOCH")
+
+// prepend additional options to the tar command for reproducibility. We
prepend because the
+// default value of this setting includes the -f option at the end, which
needs to stay at the
+// end since sbt-native-packager provides the archive file immediately after
+Universal / packageZipTarball / universalArchiveOptions := {
+ val optMtime = optSourceDateEpoch.map { epoch =>
+ val fmt = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ")
+ fmt.setTimeZone(java.util.TimeZone.getTimeZone("UTC"))
+ val mtime = fmt.format(new java.util.Date(epoch.toLong * 1000))
+ s"--mtime=$mtime"
+ }
+ val newOptions = Seq(
+ "--sort=name",
+ "--owner=0",
+ "--group=0",
+ "--numeric-owner"
+ ) ++ optMtime
+ newOptions ++ (Universal / packageZipTarball / universalArchiveOptions).value
+}
+
Universal / mappings ++= Seq(
baseDirectory.value / "bin.LICENSE" -> "LICENSE",
baseDirectory.value / "bin.NOTICE" -> "NOTICE",
@@ -83,12 +104,38 @@ carried by data processing frameworks so as to bypass any
XML/JSON overheads.
// rpmbuild behavior, we can simply append them to the RPM description and
// things still work as expected.
//
-// In this case, we want to disable zstd compression which isn't supported by
-// older versions of RPM. So we add the following special rpm %define's to use
-// gzip compression instead, which is supported by all versions of RPM.
+// Older versions of RPM do not support zstd compression. To disable this we
can
+// define _source_payload and _binary_payload to use gzip compression.
+// Additionally, the bulk of the RPM is jars which are already compressed and
+// won't really compress any further, so we set the compression level to zero
+// for faster builds.
+//
+// _buildhost is set to ensure reproducible builds regardless of the hostname
of
+// the system where where we are building the RPM.
+//
+// optflags is set to empty for reproducible builds--different systems use
+// different values of optflags and store the value in the RPM metadata. It
+// doesn't matter that we set it to nil because the macro is only used for
+// things like CFLAGS, CXXFLAGS, etc. and the way use rpmbuild it does not use
+// this flags, since it just packages files already built by SBT.
+//
+// Even with these above settings, different systems still might create RPMs
+// with different internal tags. For example, the CLASSDICT and FILECLASS tags
+// cannot be controlled by the spec file, and include human readable
+// descriptions of each installed file. These descriptions are created by
+// libmagic and can differ depending on the version of libmagic on a system.
RPM
+// also includes a PLATFORM tag that usually includes the distribution (e.g.
+// redhat vs debian), again something the spec file cannot change. And RPM also
+// includes the version of RPM used to build the RPM file. All that to say that
+// although we can minimze differences by changing some macros, the same or
very
+// similar environment is still needed for byte exact reproducible RPM builds.
+// However, this is usually enough for the rpmdiff tool to report no
differences
+// since it doesn't look at tags that don't really matter.
Rpm / packageDescription := (Rpm / packageDescription).value + """
-%define _source_payload w9.gzdio
-%define _binary_payload w9.gzdio
+%define _source_payload w0.gzdio
+%define _binary_payload w0.gzdio
+%define _buildhost daffodil.build
+%define optflags %{nil}
"""
Rpm / version := {
diff --git a/daffodil-cli/src/templates/bash-template
b/daffodil-cli/src/templates/bash-template
index ad06ddf45..28cca314a 100755
--- a/daffodil-cli/src/templates/bash-template
+++ b/daffodil-cli/src/templates/bash-template
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with