Re: Three bytes in a zip file

2023-04-07 Thread Bernhard M. Wiedemann via rb-general



On 06/04/2023 10.28, Larry Doolittle wrote:

I'm trying to make a process to generate byte-for-byte reproducible zip files.


Try adding the -X option to the zip call.
It will suppress adding of extended attributes (atime/ctime).
And with
https://github.com/distropatches/zip/commit/501ae4e93fd6fa2f7d20d00d1b011f9006802eae
it will also normalize mtime.


Ciao
Bernhard M.


OpenPGP_signature
Description: OpenPGP digital signature


Re: Three bytes in a zip file

2023-04-07 Thread Larry Doolittle
Michael -

On Fri, Apr 07, 2023 at 01:31:24PM +0200, Michael Schierl wrote:
> Larry's script already called touch immediately before zip. But I assume
> the nature of atime can mean that any other process may have "won the
> race" and accessed the file just in between these two lines.

That's my working assumption.  Also, the irreproducibility
did not reproduce.  Maybe it _is_ velocity-dependent.  ;-)

> The DOS timestamps encode only mtime, and not ctime or atime.

It does seem simpler and more reliable to keep atime out of it.

On Fri, Apr 07, 2023 at 01:25:04PM +0200, Michael Schierl wrote:
> When I distribute
> ZIP files, I often touch all files to UNIX epoch anyway as I don't want
> to leak the exact time I have built/compiled them.

Right.  Except the time I set them to here is the time of
the source git commit (SOURCE_DATE_EPOCH).

> Another option would be to use an UTC-12:00 timezone like TZ=Etc/GMT+12
> for building the .zip file to ensure the files are "old enough" for
> every place in the world.

That sounds too unexpected to me.  I'll stick to UTC.
In practice, I bet nobody will notice, and if they do, it's easy to explain.

  - Larry


Re: Three bytes in a zip file

2023-04-07 Thread Michael Schierl

Hello John,


Am 07.04.2023 um 03:56 schrieb John Gilmore:

Larry Doolittle  wrote:

$ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u fab-ea2bb52c-mb)
22c22
< -rw-r--r-- 1 redacted redacted  644661 2023-04-04 18:10:00.0 -0700 
marble-ipc-d-356.txt
---

-rw-r--r-- 1 redacted redacted  644661 2023-04-06 00:25:03.0 -0700 
marble-ipc-d-356.txt


So I'm guessing that even before the zip file is re-created, the rebuild
process is leaking the rebuild timestamp into the last-modified metadata
of the generated marble-ipc-d-356.txt file?


atime is not the same as mtime. -u switch shows atime.


That seems like it should
be handled by the build process explicitly setting its timestamp to
something related to the last-source-code-checkin time (with "touch
--date=XXX") rather than to current time.


Larry's script already called touch immediately before zip. But I assume
the nature of atime can mean that any other process may have "won the
race" and accessed the file just in between these two lines.


Truncating the timestamps to DOS timestamps wouldn't work to eliminate
this difference anyway, since the date in the two files is two days
different; DOS timestamps are accurate to 2 seconds, as I recall.


The DOS timestamps encode only mtime, and not ctime or atime.


Regards,


Michael



Re: Three bytes in a zip file

2023-04-07 Thread Michael Schierl

Hello,


Am 06.04.2023 um 23:59 schrieb Larry Doolittle:


Do you know of any tooling that can help decode zip file contents in general?
Ideally something that could be absorbed into diffoscope?
Maybe that one-liner above would be a useful addition to diffoscope.


I don't know.

I would assume that the usual commercial reverse engineering or forensic
applications would also include a dissector for .zip files, but those
could probably not be included into diffoscope anyway.


I took a quick look for the documentation you quoted.
That's proginfo/extrafld.txt in Debian's zip source package, right?


I used 
(yes I am oldschool and still have that old reference documentation on
my hard disk :-D).

But the documentation you quoted looks more recent and contains way more
extra fields than the "old" Info-Zip document. Probably I'll refer to it
in the future :-)


It sure looks reverse-engineered.  I guess I shouldn't expect anything
different for a package where upstream source ends in 2008.  :-/


... implementing a previously proprietary file format from the '80s.


Bad: the only time stamps left in the file are DOS-style implied-local-
timezone.  So a zip file prepared with TZ=UTC (as needed for reproducibility)
will unpack to files with future timestamps (if unpacked shortly after being 
created)
for non-expert users in half the globe.


Assuming you have "real" timestamps in your ZIP files. When I distribute
ZIP files, I often touch all files to UNIX epoch anyway as I don't want
to leak the exact time I have built/compiled them. But YMMV.

Another option would be to use an UTC-12:00 timezone like TZ=Etc/GMT+12
for building the .zip file to ensure the files are "old enough" for
every place in the world.


Regards,


Michael


Re: Please review the draft for March's report

2023-04-07 Thread Chris Lamb
Chris Lamb wrote:

> Please review the draft for March's Reproducible Builds report:

This has now been published — thanks to all who contributed.

If possible, please share the following link:

  https://reproducible-builds.org/reports/2023-03/

.. and also consider retweeting:

  https://twitter.com/ReproBuilds/status/1644283929598337024


Regards,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o