On Wed, Apr 14, 2021 at 10:36 PM Joachim Wuttke <j.wut...@fz-juelich.de> wrote: > > If argument fname of savetxt(fname, X, ...) ends with ".gz" then > array X is not only converted to text, but also compressed using gzip. > > The format gzip [1] has a timestamp. The Python module gzip.py [2] > sets the timestamp according to an optional constructor argument > "mtime". By default, the current time is used. > > This makes the file written by savetxt(*.gz, ...) non-deterministic. > This is unexpected and confusing in a numerics context.
Related: same for np.savez https://github.com/numpy/numpy/issues/9439 AndrĂ¡s > I let different versions of a program generate *.gz files, and ran > the "diff" util over pairs of output files to check whether any bit > had changed. To my surprise, confusion, and desperation, output > always had changed, and kept changing when I ran unchanged versions > of my program over and again. So I learned the hard way that the > *.gz files contain a timestamp. > > Regarding the module gzip.py, I submitted a pull request to improve > description of the optional argument mtime, and hint at the possible > choice mtime = 0 that makes outputs deterministic [3]. > > Regarding numpy, I'd propose a bolder measure: > To let savetxt(fname, X, ...) store exactly the same information in > compressed and uncompressed files, always invoke gzip with mtime = 0. > > I would like to follow up with a pull request, but I am unable to > find out how numpy.savetxt is invoking gzip. > > Joachim > > [1] https://www.ietf.org/rfc/rfc1952.txt > [2] https://docs.python.org/3/library/gzip.html > [3] https://github.com/python/cpython/pull/25410 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion