Hi Ricardo, Ricardo Wurmus <[email protected]> writes:
> Hi Tim, > > thanks for the patch. > >> From bb29ee8ccc656b86039127b31fd8b79533927053 Mon Sep 17 00:00:00 2001 >> From: Timothy Sample <[email protected]> >> Date: Wed, 2 Jan 2019 16:40:48 -0500 >> Subject: [PATCH] gnu: ghc: Sort packages before writing binary cache. >> >> This improves the reproducibility of packages built with the Haskell >> build system. >> >> * gnu/packages/haskell.scm (ghc)[arguments]: Add a phase that patches >> 'ghc-pkg' so that it sorts packages before generating a binary cache. > > Okay. > >> + ;; This phase patches the 'ghc-pkg' command so that it sorts >> + ;; the list of packages in the binary cache it generates. >> + (add-after 'unpack 'patch-ghc-pkg >> + (lambda _ >> + (substitute* "utils/ghc-pkg/Main.hs" >> + (("import Data.List") >> + (string-append "import Data.List\n" >> + "import Data.Ord (comparing)")) >> + (("pkgsCabalFormat = packages db") >> + (string-append "pkgsCabalFormat = sortBy" >> + " (comparing (display . installedUnitId))" >> + " (packages db)"))))) > > This sorts the list “pkgsCabalFormat” in “updateDBCache” by the display > value of the “installedUnitId” field of each package. According to the > documentation at [1], the UnitId type has an Ord instance, so you > probably don’t need “display”; you don’t need to sort strings but can > sort the UnitId values directly. > > [1]: > https://www.haskell.org/cabal/release/latest/doc/API/Cabal/Distribution-Types-UnitId.html#t:UnitId Whoops! I remember checking for an Ord instance, but I must of missed it. The documentation is embarrassingly clear. :p > I’m not sure about using installedUnitId here. Is this field unique? > “sourcePackageId” is the combination of package name and version. I > don’t understand the UnitId documentation, so I can’t say if that value > is any better. Based on what I see in the source code and in the cache files themselves, this is best field. However, I am just guessing and testing here. Discussion around this gets into territory I’m unfamiliar with (I’ve never heard of a “Backpack”, for instance). > I wonder if it would be better to sort the result of > “getDirectoryContents” instead. As far as I understand, this is the > cause of non-determinism here. The function “readParseDatabase” (which > contains the “getDirectoryContents” call) is used in multiple places > throughout “ghc-pkg/Main.hs”. > > The most appropriate line to modify would then be this: > > confs = map (path </>) $ filter (".conf" `isSuffixOf`) fs > > where “fs” is the list of FilePath values (strings). I think you can > just do this: > > confs = map (path </>) $ filter (".conf" `isSuffixOf`) (sort fs) > > because “fs” is of type [FilePath], which is [String], which is sortable > via “sort” as String has an Ord instance. > > What do you think? I thought about this approach, but I was worried it wouldn’t be so easy. What you suggest looks pretty straight-forward though. I will test everything with this approach and report back. If it works, I agree that it is better. Thanks for the careful review! -- Tim
