On Sat, 2020-08-01 at 15:26 -0700, Kevin Fenzi wrote:
> On Sat, Aug 01, 2020 at 02:03:40PM -0600, Jeff Law wrote:
> > On Sat, 2020-08-01 at 12:12 +0200, Kevin Kofler wrote:
> > > Hi,
> > > 
> > > seeing the amount of fallout from LTO, I really think that this feature 
> > > ought to be dropped from F33, and evaluated carefully for F34 (i.e., can 
> > > it 
> > > be done without breaking the build of or miscompiling a large part of the 
> > > distribution, once the bugs such as the ld bug discussed in this thread 
> > > are 
> > > fixed, or is it just unsafe to enable by default to begin with?). I.e., 
> > > revert it for F33 for sure, then decide whether to retry it for F34 or 
> > > can 
> > > it permanently.
> > Most of the fallout has been Nick pushing through binutils builds that are
> > broken.  Seriously, there's been at least 4 builds pushed through that kept
> > bringing back the *same* problem.
> > 
> > And just to be clear, this has been 6+ months of behind the scenes work to 
> > find
> > and identify issues, fix broken packages, put global mitigations of broken 
> > crap
> > in place in place, opt-out packages that do things that are fundamentally
> > incompatible with LTO, etc.  In fact it was that behind-the-scenes work that
> > pushed this feature from F32 to F33 as it just wasn't ready to go in F32.
> > 
> > I think the chances of a serious mis-compilation large parts of the 
> > distribution
> > are small.  The one mis-compilation we know about was a latent linker bug 
> > that
> > just happened to be triggered by LTO and that particular bug we know how to
> > identify any packages that might have been broken.
> > 
> > Frankly, there's been more fallout from infrastructure breakage and cmake 
> > issues
> > than anything.  I went through the first ~1000 failures proactively looking 
> > for
> > things that were potentially LTO related and fixing them half-dozen or so I
> > found, but by far the s390 infrastructure and cmake changes have caused more
> > failures than anything.
> > 
> > As has always been the case, I'm here to address any problems that arise 
> > and use
> > my 30 years of experience with GCC development as well as distribution mass
> > rebuilds to make informed decisions about the best course of action for any
> > particular problem.
> 
> Yeah, the s390x failures were anoying. I have several ideas to make
> things more robust that hopefully we can do before next mass rebuild: 
> 
> * move the cache host from a z/vm instance to a kvm one. 
> * We have the kvm ones oversubscribed on cpus, so I'd like to drop all
> of them from 4 cpus to 3. 
> * We might play with the weight on them so koji doesn't run as many jobs
> at a time as it does now.
> * Make sure ci/koji-simple-ci/koschei isn't doing any long running
> builds when the mass rebuild starts. A gcc or libreoffice build can take
> up a builder for a long time. 
> * Run the mass rebuild with --fail-fast so if something fails on some
> other arch, it never even needs to run on s390x. 
> 
> Anyhow, the mass rebuild is over and tagged in. Rawhide compose is
> running and should hopefully finish later today. 
> 
> The second pass took failures from 4162 to 2833, so that helped
> a lot: https://kojipkgs.fedoraproject.org/mass-rebuild/f33-failures.html
Cool.  Thanks for the update.  I'll start scanning through those.  It looks like
some of the cmake things are getting addressed which I'm sure helps too.

Jeff
> 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to