On Fri, May 11, 2018 at 4:06 PM, Gregory Szorc <g...@mozilla.com> wrote:

> On Wed, May 9, 2018 at 11:01 AM, Ted Mielczarek <t...@mielczarek.org>
> wrote:
>
> > On Wed, May 9, 2018, at 1:11 PM, L. David Baron wrote:
> > > > mozregression won't be able to bisect into inbound branches then,
> but I
> > > > believe we've always been expiring build artifacts created from
> > integration
> > > > branches after a few months in any case.
> > > >
> > > > My impression was that people use mozregression primarily for
> tracking
> > down
> > > > relatively recent regressions. Please correct me if I'm wrong.
> > >
> > > It's useful for tracking down regressions no matter how old the
> > > regression is; I pretty regularly see mozregression finding useful
> > > data on bugs that regressed multiple years ago.
> >
> > To be clear here--we still have an archive of nightly builds dating back
> > to 2004, so you should be able to bisect to a single day using that. We
> > haven't ever had a great policy for retaining individual CI builds like
> > these tinderbox-builds. They're definitely useful, and storage is not
> that
> > expensive, but given the number of build configurations we produce
> nowadays
> > and the volume of changes being pushed we can't archive everything
> forever.
>
>
> It's worth noting that once builds are deterministic, a build system is
> effectively a highly advanced caching mechanism. It follows that cache
> eviction is therefore a tolerable problem: if the entry isn't in the cache,
> you just build again! Artifact retention and expiration boils down to a
> trade-off between the cost of storage and the convenience of accessing
> something immediately (as opposed to waiting several dozen minutes to
> populate the cache).
>
> The good news is that Linux Firefox builds have been effectively
> deterministic (modulo PGO and some minor build details like the build time)
> for several months now (thanks, glandium!). And moving to Clang on all
> platforms will make it easier to achieve deterministic builds on other
> platforms. The bad news is we still have many areas of CI that are not
> hermetic and attempts to retrigger Firefox build tasks in the future have a
> very high possibility of failing for numerous reasons (e.g. some dependent
> task of the build hits a 3rd party server that is no longer available or
> has deleted a file). In other words, our CI results may not be reproducible
> in the future. So if we delete an artifact, even though the build is
> deterministic, we may not have all the inputs to reconstruct that result.
>
> Making CI hermetic and reproducible far in the future is a hard problem.
> There are esoteric failure scenarios like "what if we need to fetch content
> from a server in 2030 but TLS 1.2 has been disabled due to a critical
> vulnerability and code in the hermetic build task doesn't support TLS 1.3."
> In order to realistically achieve reproducible builds in the future, we
> need to store *all* inputs somewhere reliable where they will always be
> available. Version control is one possibility. A content-indexed service
> like tooltool is another. (At Google, they check in the source code for
> Clang, glibc, binutils, Linux, etc into version control so all they need is
> a version revision and a bootstrap compiler (which I also suspect they
> check into the monorepo) to rebuild the world from source.)
>
> What I'm trying to say is we're making strides towards making builds
> deterministic and reproducible far in the future. So hopefully in a few
> years we won't need to be concerned about deleting old data because our
> answer will be "we can easily reproduce it at any time."
>

This might end up being true, but it seems a bit optimistic to me. I've
worked
with lots of systems much simpler than our builds that were in theory
reproducible
but then found when I went back to reproduce the results, things weren't so
simple.
You allude to one case above: it's one thing to have reproducible builds
from
days ago and quite another from years ago.

Given the incredibly low cost of storage (the street price of Glacier is
$.004/GB/month) [0]
I'd be pretty hesitant to delete data which we thought we might want to use
again
just because we figured we'd reproduce it.

-Ekr

[0] https://aws.amazon.com/glacier/

> _______________________________________________
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to