Thanks Jarek, awesome work!

On Sun, Feb 2, 2020 at 11:09 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Still seems that the "timeout" in the last kerberos job is back now
> (intermittent) - seems to appear when we run more of those builds in
> parallel.
> So still one more diagnosis/fix is needed I am afraid.
>
> On Sun, Feb 2, 2020 at 11:06 AM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > Great work Jarek!
> >
> > On 2 February 2020 09:18:52 GMT, Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >>
> >> Ok. The master is fixed now (finally!). The master is now working so
> please
> >> rebase all of your open PRs to master.
> >>
> >> At the end we had a number of different problems, some coincidences  at
> the
> >> same time that’s why it was so hectic and difficult to diagnose:
> >>
> >>    - Travis queue was stalled (at some point in time we had some 20
> builds
> >>    waiting in a queue) so we did not rebase some merges to save time and
> >>    merged them  from old masters
> >>    - Some of the master merges were cancelled - so we could not see
> which
> >>    commit broke the build - that make us come up with different
> hypothesis for
> >>    the problem
> >>    - Our optimisations for CI builds optimisations (skip Kubernetes
> builds
> >>    when no kubernetes-related changes) cause the contrib/example_dags
> move to
> >>    slip under the radar of PR CI checks
> >>    - Even if we did not have the optimisations -  Kubernetes Git Sync
> uses
> >>    master of Airflow, so we would not have detected that by PR failure
> (only
> >>    after merge)
> >>    - We had a number of “false positives” and lack of detailed logs for
> >>    Kubernetes.
> >>    - We had a mysterious hang on kerberos tests - but it was caused
> likely
> >>    by Travis environment change (it’s gone now)
> >>    - We had Redis test failures caused by 3.4 release of redis-py
> libraries
> >>    which contained a change (Redis class became un-hashable by adding
> __eq__
> >>    hook) - luckily they reverted it two hours ago (
> >>    https://github.com/andymccurdy/redis-py/blob/master/CHANGES)
> >>    - We downloaded Apache RAT tool from a maven repository. And this
> maven
> >>    repo is very unstable recently.
> >>    - There are a number of follow-up PRs (already merged or building on
> >>    Travis now)  that will resolve those problems and prevent it in the
> future.
> >>
> >> J.
> >>
> >>
> >> On Thu, Jan 30, 2020 at 11:16 AM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>
> >>  Spent a little bit of time looking at this and it seems it was (super)
> >>>  flaky tests -- I've managed to get 1 commit back on master passing by
> just
> >>>  retrying the one failed job.
> >>>
> >>>  Looking at the latest commit now.
> >>>
> >>>  On Jan 30 2020, at 7:54 am, Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> >>>
> >>>> It looks like we have a failing master - seems that yesterday's
> Travis'
> >>>> super-slow queue and a number of PRs that were merged without rebasing
> >>>>
> >>> and
> >>>
> >>>>  caused master to be broken.
> >>>>
> >>>>  I will not be at my PC for couple of hours at least so maybe some
> other
> >>>>  committers can take a look in the meantime.
> >>>>
> >>>>  J.
> >>>>
> >>>>  --
> >>>>  Jarek Potiuk
> >>>>  Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>>>
> >>>>  M: +48 660 796 129 <+48660796129>
> >>>>  [image: Polidea] <https://www.polidea.com/>
> >>>>
> >>>>
> >>>
> >>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>


-- 

Michał Słowikowski
Polidea <https://www.polidea.com/> | Junior Software Engineer

E: michal.slowikow...@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Reply via email to