I needed to find a way to create reproducible builds, regardless of the dev 
environment user uses. Luckily, Go gives almost everything needed for that 
out-of-the-box, and there is a great blog post by Filippo on the 
topic: https://blog.filippo.io/reproducing-go-binaries-byte-by-byte. If we 
have the same Go version and the same set of dependencies (which is easy 
when using vendor/ approach), the only problem is the difference in the 
absolute path of the working directory. In other words, the same code, 
built on the same dev environment in `GOPATH/src/project1` and 
`GOPATH/src/project2` will yield different binaries. There is an open issue 
for that in Go, and it will be hopefully addressed in Go 1.12 
(https://github.com/status-im/status-react/issues/5587).

For now, the easy approach, of course, is to use docker for the build, but 
that feels too heavy just for ensuring the same dir. Spoofing directory 
with LD_PRELOAD hacks or using `chroot` approach also have obvious 
drawbacks – the need of C toolchain and root access, respectively.

After analyzing the binaries, I realized that they differ only in buildid 
stamp, the rest is the same. BuildiD is very well explained here: 
https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L24

For a quick recap, every Go package or binary is stamped with buildid 
value, which is essentially a 4 hash value:

   actionID(binary)/actionID(main.a)/contentID(main.a)/contentID(binary)

where:
 - actionID means a unique identifier of the inputs (sources, file names, 
go version, etc)
 - contentID means a unique identifier of the outputs (actual content 
output by compiler/linker)

So my thought went in the following direction – *I don't care if the 
actionID (inputs) is different, but do care if contentID (outputs) are 
different.*

If contentID is equal, I can just rewrite actionID with "expected" one and 
get the same binary byte-by-byte. This can be fully automated in Makefile 
or script. So the steps for the reproducible build are the following:

 - build binary - `go build -ldflags "-s -w" -asmflags=-trimpath="$(pwd)" 
-gcflags=-trimpath="$(pwd)"`
 - extract buildid - `go tool buildid myapp`
 - compare buildid's contentID values to known ones - `diff <(go tool 
buildid ./myapp  | cut -d'/' -f3) <(cat release.buildid.txt  | cut -d'/' 
-f3)`
 - if they're equal, assume that build is the same, and just rewrite the 
buildid value inside the binary - `objcopy --update-section 
.note.go.buildid=release.buildid.bin ./myapp` for ELF

In my tests that result in byte-by-byte equal binaries.

I have two concerns with this approach:
 1) I might be missing some corner cases, especially with hacking binaries 
of different formats. What perils of patching binary can be here?
 2) buildID hash is actually a truncated version of real hash (259 to 67 
bytes), which increases the collision probability and is totally fine for 
the task "determine if binary should be rebuilt", but might be a concern 
for the task "guarantee that the build is the same". More explanation 
here: 
https://github.com/golang/go/blob/master/src/cmd/go/internal/work/buildid.go#L113

Any thoughts on that? What else am I missing? Would this be a viable 
workaround for having reproducible build until #5587 is solved?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to