Thanks Jarek, awesome work! On Sun, Feb 2, 2020 at 11:09 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> Still seems that the "timeout" in the last kerberos job is back now > (intermittent) - seems to appear when we run more of those builds in > parallel. > So still one more diagnosis/fix is needed I am afraid. > > On Sun, Feb 2, 2020 at 11:06 AM Ash Berlin-Taylor <a...@apache.org> wrote: > > > Great work Jarek! > > > > On 2 February 2020 09:18:52 GMT, Jarek Potiuk <jarek.pot...@polidea.com> > > wrote: > >> > >> Ok. The master is fixed now (finally!). The master is now working so > please > >> rebase all of your open PRs to master. > >> > >> At the end we had a number of different problems, some coincidences at > the > >> same time that’s why it was so hectic and difficult to diagnose: > >> > >> - Travis queue was stalled (at some point in time we had some 20 > builds > >> waiting in a queue) so we did not rebase some merges to save time and > >> merged them from old masters > >> - Some of the master merges were cancelled - so we could not see > which > >> commit broke the build - that make us come up with different > hypothesis for > >> the problem > >> - Our optimisations for CI builds optimisations (skip Kubernetes > builds > >> when no kubernetes-related changes) cause the contrib/example_dags > move to > >> slip under the radar of PR CI checks > >> - Even if we did not have the optimisations - Kubernetes Git Sync > uses > >> master of Airflow, so we would not have detected that by PR failure > (only > >> after merge) > >> - We had a number of “false positives” and lack of detailed logs for > >> Kubernetes. > >> - We had a mysterious hang on kerberos tests - but it was caused > likely > >> by Travis environment change (it’s gone now) > >> - We had Redis test failures caused by 3.4 release of redis-py > libraries > >> which contained a change (Redis class became un-hashable by adding > __eq__ > >> hook) - luckily they reverted it two hours ago ( > >> https://github.com/andymccurdy/redis-py/blob/master/CHANGES) > >> - We downloaded Apache RAT tool from a maven repository. And this > maven > >> repo is very unstable recently. > >> - There are a number of follow-up PRs (already merged or building on > >> Travis now) that will resolve those problems and prevent it in the > future. > >> > >> J. > >> > >> > >> On Thu, Jan 30, 2020 at 11:16 AM Ash Berlin-Taylor <a...@apache.org> > wrote: > >> > >> Spent a little bit of time looking at this and it seems it was (super) > >>> flaky tests -- I've managed to get 1 commit back on master passing by > just > >>> retrying the one failed job. > >>> > >>> Looking at the latest commit now. > >>> > >>> On Jan 30 2020, at 7:54 am, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >>> > >>>> It looks like we have a failing master - seems that yesterday's > Travis' > >>>> super-slow queue and a number of PRs that were merged without rebasing > >>>> > >>> and > >>> > >>>> caused master to be broken. > >>>> > >>>> I will not be at my PC for couple of hours at least so maybe some > other > >>>> committers can take a look in the meantime. > >>>> > >>>> J. > >>>> > >>>> -- > >>>> Jarek Potiuk > >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer > >>>> > >>>> M: +48 660 796 129 <+48660796129> > >>>> [image: Polidea] <https://www.polidea.com/> > >>>> > >>>> > >>> > >>> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > -- Michał Słowikowski Polidea <https://www.polidea.com/> | Junior Software Engineer E: michal.slowikow...@polidea.com Unique Tech Check out our projects! <https://www.polidea.com/our-work>