Just to let everyone know - our quest on stabilising/speeding up the CI continues.
I just merged one of the final (yeah final final final ...) optimization, where just a typo correction in .md files /non-doc .rst files should take ~ 1m to complete. Yep. You read that right. Some simple changes will not trigger a full test suite, just a small relevant subset of which should be really fast. We will observe it and will see if we need to adjust it and fix any other teething issues, but I hope this will be helpful in fighting the current job limits we have in the whole Apache organisation before we - hopefully - get self-hoster runners in place. J. On Tue, Oct 13, 2020 at 9:19 PM Jarek Potiuk <[email protected]> wrote: > I do expect some small teething problems again, but I hope the big one is > over and I will try to address those problems if they arise. Apologies for > that - this was rather difficult to test on "Apache Organization" scale. We > are also talking about adding some github custom runners, because we expect > the situation will deteriorate in the future if we don't. > > On Tue, Oct 13, 2020 at 9:14 PM Jarek Potiuk <[email protected]> > wrote: > >> There is a bad news and a good news :). >> >> * The bad one is that the change did not go well with its original scope. >> It turned out that many small jobs are not a good idea when you have 180 >> slots in a queue and a number (growing) of Apache projects and yours are >> competing for those. Seems that our jobs got starved a lot and the effect >> was 2-3 hours waiting queues which were growing afternoon when US started >> to wake up. >> >> * The good one is that I just merged a fix to that - instead of many >> small jobs, we grouped several test types in single jobs and we clean-up >> between the jobs and reusing the machines. I believe this will be even more >> optimized, and uses the same concepts of optimization as before. >> >> I cancelled all the queued builds and asked people to rebase to latest >> master. If you have not done so yet - please do it now! >> >> J. >> >> >> On Mon, Oct 12, 2020 at 1:27 AM Daniel Imberman < >> [email protected]> wrote: >> >>> Thanks Jarek! This was much needed and should lead to a cleaner dev >>> process >>> >>> via Newton Mail >>> <https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.51&pv=10.15.6&source=email_footer_2> >>> >>> On Sun, Oct 11, 2020 at 3:37 PM, Jarek Potiuk <[email protected]> >>> wrote: >>> >>> Hello everyone, >>> >>> I have really high hopes for the CI change that we implemented over the >>> weekend. Last few weeks we experienced a lot of stability problems with the >>> CI, and our builds were rarely "green" - and mostly due to >>> intermittent/unrelated problems. We've implemented some workarounds and >>> splitting to a bigger number of smaller jobs that so far has proven to be >>> much more stable and "greener", >>> >>> You will see a much bigger number of test checks than you used to (up to >>> 120 or so), but they will be quite a bit faster. Also - if any of the >>> checks fail fo a good reason, you should be able to find information on how >>> to reproduce the failures locally in the test output - so that you can fix >>> it. >>> >>> We will be watching and fixing any teething problems over the next few >>> days, but for now - please rebase to the latest master and try it out. >>> >>> J. >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
