On 31/03/2019 14:08, Yann E. MORIN wrote:
Hello All,Recent versions of tar have slightly changed the format of archives. Most notably: - 1.27 changed gnu long link headers for path elements > 100 characters - 1.30 changed --numeric-owner for filenames > 100 characters In Buildroot, we are using hashes of archives to ensure reproducibility of the source code we build. We also generate tarballs for licensing compliance. In both cases, we use hashes for those archives. The two changes above mean that we have to restrict the tar versions we accept to a small subset. All the hashes we have so far have been made over the years, and they all use the format that was generated by versions 1.27 to 1.29. As distributions are updated, they all switch to 1.30 or later, we have to then always build our own version of tar. Currently, we envision three paths: - keep the status quo: this is not nice, because we would always have to build our own tar going forward, for every builds; - switch to an alternate archive format: this is not nice, because people are used to tarballs, and the alternatives are not all reproducible either; those that are repriducible are much less known, or practical to use, than tarballs; - bite the bullet, and redo all the hashes with the newer tar format: in the future every one will have a newer tar, and so we won't have o build our own every time. That last point is what we would prefer, if we could be sure that there would be no change in the output format in the foreseeable future. So, here's my question: starting with tar-1.32 (the latest release as of today), is the gnu tar format considered stable now, or is there no guarantee about the gnu tar format stability? For reference, here's how we generate the archives: tar cf - \ --numeric-owner --owner=0 --group=0 --mtime="${date}" \ --format=gnu -T "list.sorted" >"${output}.tar" Can we expect this to be reproducible with future tar releases?
As a more general solution for others in a similar predicament, could GNU tar add the ability to explicitly request the formats produced by earlier versions, for example by adding options such as --format=gnu1.27 and --format=gnu1.30(named for the versions that first introduced the specific format changes, with a view to add new ones as future changes are introduced). Alternatively, could the Buildroot and GNU tar teams check if one of the historic formats already explicitly supported by the --format option provides the required stability. Either way, the difference is between two interpretations of the --format option: A. Restrict the output to headers that are understood by specific old/3rd party unpackers. B. Reproduce a very specific output, including how tar chooses between seemingly equivalent header types, ignored values etc. This includes bugward compatibility with historic tar output bugs that made the wrong choices. The 3rd option, consistent with how reproducible builds are otherwise done, is to treat tar as part of the tool chain, thus making the exact build or source version of tar part of the list of exact tool versions needed to reproduce a specific build (just like there is already a requirement to use exact versions of gcc, autotools etc.), doing so would also allow the historic hash values to remain valid, as they are each tied to the tar version they were historically built with. Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
