On Fri, Jul 07, 2017 at 06:42:25PM +0200, Danny Milosavljevic wrote: > Leo Famulari <[email protected]> wrote: > > > That leaves the document UUID - and upstream, in some of the other > > I think the lowest risk is to do nothing to Ghostscript and move the PDF > > documentation to a separate 'doc' output. Then, we could have > > reproducible binaries and ignore the PDF issues for now. Does anyone > > know how many packages include PDF documentation built with Ghostscript? > > Aren't the derivations of the doc outputs still a problem? For > example, Hydra will run out of space sooner or later because it keeps > building them, right?
Do these timestamps and UUID affect the derivations? I figured they only affected the result of running the derivation — that is, the output of the build process. Those outputs are what we'd like to create reproducibly, but they don't cause rebuilds if they are not reproducible. If a package's dependency graph is identical to before, Guix (and I assume Hydra) will not rebuild it, even if we humans know that the built output is unreproducible, such as when timestamps are embedded. My apologies if I misinterpreted your question. We run out of space and have to garbage collect periodically anyways. Regardless, once we own the Hydra machine, I'd like for us to buy a huge amount of storage and keep built outputs for much longer than we do now. In practice, it's not really possible to go back in time more than 6 months of Guix, due to missing upstream sources and test suites with expiration dates. > > 2) At least some of the patches in the related Ghostscript discussions > > seem to be proof of concepts rather than finished code: > > https://bugs.ghostscript.com/show_bug.cgi?id=697484#c3 > > So, if these patches came from there, we'd want to be extra careful. > > No, I wrote the ones here without external sources (except for the > direct discussion on my newish upstream bug report, and the PDF and > XMP specifications - whatever worth they have). Ah, thanks for the clarification. > > By the way, this is the patch used for Debian's latest Ghostscript > > package: > > > > https://anonscm.debian.org/git/printing/ghostscript.git/tree/debian/patches/2010_add_build_timestamp_setting.patch?id=e2bf3ad7026afe13636d4937430c3fdae7854078 > > > > That patch was not reviewed on a public forum, at least nothing I can > > find with Google. Again, I'd want to get the Ghostscript team's advice. > > On such an approach they advised that we should only generate *unique* > UUIDs. But the UUIDs are generated from these times. So that linked > patch would generate multiple non-unique uuids on systems. > > That's why I removed the entire UUID and Time sections and actually > didn't fiddle with the ghostscript-internal times at all. Builds > reproducibly. > > I wonder how many packages actually use the ghostscript pdf writer > too. How to find that out? > > Note that groff itself also fails to build reproducibly without the > patches. > > In any case, the patch 2/2 is quite tame (it looks scary because of > the printf splitting, but it's actually just either leaving "/ID[...]" > off or not, globally). > > But I understand that it would be even easier to do nothing. Wouldn't > make the stuff reproducible, though. > > I'd vote for an environment variable to disable UUID printing and also > Time header printing. That way it would do everything normally in > regular usage - but when used in packages, it would just not *print* > the problematic stuff. No internal state is changed at all by the > patches. Okay, thank you for explaining this (especially if you already explained it! It's hard to join a conversation like this halfway through). I'll read your patches carefully later today.
signature.asc
Description: PGP signature
