This is an automated email from the ASF dual-hosted git repository.

slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil.git


The following commit(s) were added to refs/heads/main by this push:
     new 3a8d6bab3 Improve reproduciblity for release candidate artifacts
3a8d6bab3 is described below

commit 3a8d6bab342091160b0d3ea72b00c63334b02216
Author: Steve Lawrence <[email protected]>
AuthorDate: Tue Feb 11 11:02:22 2025 -0500

    Improve reproduciblity for release candidate artifacts
    
    The SBT native packager plugin is used to build helper binaries for
    release candidate. In some cases these binaries are difficult to check
    for reproducibility due to metadata that is embedded in the files. This
    modifies our SBT configurations where possible to remove as much
    variance as possible.
    
    For tar, this adds options (based on tar reproducibility documentation)
    that sets things like userid's and modification times to consistent
    values. Note that the --sort=name option does not work on the version of
    tar available on GitHub Windows and MacOS systems, so we now only
    generate the tar on Linux CI.
    
    For rpm, this sets a number of macros (e.g buildhost) so that the
    embedded values in the RPM are always the same regardless of the actual
    environment properties, which can differ between systems. We also change
    the shebang in the bash script to be more portable. Note that are still
    some macros in RPM that cannot be controlled my %defines, so in general
    a same or similar environment is needed for reproducible RPMs.
    
    For msi, there is nothing more we can do. There are only a couple of
    timestamps and UUID's that cannot be changed. msidiff is a useful tool
    that shows these are the only differences.
    
    Zip artifacts are already reproducible and do not need changes.
    
    DAFFODIL-2971
---
 .github/workflows/main.yml               |  8 ++---
 daffodil-cli/build.sbt                   | 57 +++++++++++++++++++++++++++++---
 daffodil-cli/src/templates/bash-template |  2 +-
 3 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index d874dbd68..7d249db4d 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -184,12 +184,12 @@ jobs:
       - name: Build Documentation
         run: $SBT unidoc genTunablesDoc
 
-      - name: Package Zip & Tar
-        run: $SBT daffodil-cli/Universal/packageBin 
daffodil-cli/Universal/packageZipTarball
+      - name: Package Zip
+        run: $SBT daffodil-cli/Universal/packageBin
 
-      - name: Package RPM (Linux)
+      - name: Package RPM & Tar (Linux)
         if: runner.os == 'Linux'
-        run: $SBT daffodil-cli/Rpm/packageBin
+        run: $SBT daffodil-cli/Rpm/packageBin 
daffodil-cli/Universal/packageZipTarball
 
       ############################################################
       # Check
diff --git a/daffodil-cli/build.sbt b/daffodil-cli/build.sbt
index 77ebb1156..5d71d222e 100644
--- a/daffodil-cli/build.sbt
+++ b/daffodil-cli/build.sbt
@@ -30,6 +30,27 @@ Linux / packageName := executableScriptName.value
 Rpm / packageName := "apache-" + executableScriptName.value
 Windows / packageName := executableScriptName.value
 
+val optSourceDateEpoch = scala.util.Properties.envOrNone("SOURCE_DATE_EPOCH")
+
+// prepend additional options to the tar command for reproducibility. We 
prepend because the
+// default value of this setting includes the -f option at the end, which 
needs to stay at the
+// end since sbt-native-packager provides the archive file immediately after
+Universal / packageZipTarball / universalArchiveOptions := {
+  val optMtime = optSourceDateEpoch.map { epoch =>
+    val fmt = new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ssZ")
+    fmt.setTimeZone(java.util.TimeZone.getTimeZone("UTC"))
+    val mtime = fmt.format(new java.util.Date(epoch.toLong * 1000))
+    s"--mtime=$mtime"
+  }
+  val newOptions = Seq(
+    "--sort=name",
+    "--owner=0",
+    "--group=0",
+    "--numeric-owner"
+  ) ++ optMtime
+  newOptions ++ (Universal / packageZipTarball / universalArchiveOptions).value
+}
+
 Universal / mappings ++= Seq(
   baseDirectory.value / "bin.LICENSE" -> "LICENSE",
   baseDirectory.value / "bin.NOTICE" -> "NOTICE",
@@ -83,12 +104,38 @@ carried by data processing frameworks so as to bypass any 
XML/JSON overheads.
 // rpmbuild behavior, we can simply append them to the RPM description and
 // things still work as expected.
 //
-// In this case, we want to disable zstd compression which isn't supported by
-// older versions of RPM. So we add the following special rpm %define's to use
-// gzip compression instead, which is supported by all versions of RPM.
+// Older versions of RPM do not support zstd compression. To disable this we 
can
+// define _source_payload and _binary_payload to use gzip compression.
+// Additionally, the bulk of the RPM is jars which are already compressed and
+// won't really compress any further, so we set the compression level to zero
+// for faster builds.
+//
+// _buildhost is set to ensure reproducible builds regardless of the hostname 
of
+// the system where where we are building the RPM.
+//
+// optflags is set to empty for reproducible builds--different systems use
+// different values of optflags and store the value in the RPM metadata. It
+// doesn't matter that we set it to nil because the macro is only used for
+// things like CFLAGS, CXXFLAGS, etc. and the way use rpmbuild it does not use
+// this flags, since it just packages files already built by SBT.
+//
+// Even with these above settings, different systems still might create RPMs
+// with different internal tags. For example, the CLASSDICT and FILECLASS tags
+// cannot be controlled by the spec file, and include human readable
+// descriptions of each installed file. These descriptions are created by
+// libmagic and can differ depending on the version of libmagic on a system. 
RPM
+// also includes a PLATFORM tag that usually includes the distribution (e.g.
+// redhat vs debian), again something the spec file cannot change. And RPM also
+// includes the version of RPM used to build the RPM file. All that to say that
+// although we can minimze differences by changing some macros, the same or 
very
+// similar environment is still needed for byte exact reproducible RPM builds.
+// However, this is usually enough for the rpmdiff tool to report no 
differences
+// since it doesn't look at tags that don't really matter.
 Rpm / packageDescription := (Rpm / packageDescription).value + """
-%define _source_payload w9.gzdio
-%define _binary_payload w9.gzdio
+%define _source_payload w0.gzdio
+%define _binary_payload w0.gzdio
+%define _buildhost daffodil.build
+%define optflags %{nil}
 """
 
 Rpm / version := {
diff --git a/daffodil-cli/src/templates/bash-template 
b/daffodil-cli/src/templates/bash-template
index ad06ddf45..28cca314a 100755
--- a/daffodil-cli/src/templates/bash-template
+++ b/daffodil-cli/src/templates/bash-template
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/bin/env bash
 #
 # Licensed to the Apache Software Foundation (ASF) under one or more
 # contributor license agreements.  See the NOTICE file distributed with

Reply via email to