Thanks for bringing the issue of CI stability!

However I disagree with some points in this thread:

- "We are at approximately 3h for a full successful run."
=> Looking at Jenkins I see the last successful runs oscillating between
1h53 and 2h42 with a mean that seems to be at 2h20. Or are you talking
about something different than the jenkins CI run?

- "For this I propose working towards moving tests from CI to nightly,
specially
the ones that take most time or do black box testing with full training of
models. And addressing flaky tests by either fixing them or *disabling
them." *
=> Is there any evidence that some serious effort has been spent trying to
fix the flaky tests? I know Sheng and Marco have worked to consolidating a
list of Flaky tests, but I think simply disabling tests will just make the
platform weaker. Let's organize a flaky test week where we each take on a
couple of these flaky tests and hopefully we should make good progress
towards stabilizing the CI.

-"I'd like to disable flaky tests until they're fixed."
=> Wishful thinking IMO, we know this never happens, if we can't make time
now to fix them, we'll never go back and fix them.

"I would want a turnaround time of less than 30 minutes and 0% failure rate
on master."
 => With current timing, this means barely finishing the build step. Do we
propose dropping some platforms for building?

I agree with some points:

"Won't we end up in the same situation with so many flaky tests?" => pretty
sure it will
"This could be set to 100% for nightly, for example."[for the release] =>
That would be a given to me
"I'm also currently working on a system that tracks all test failures, so
this will also cover nightly tests. This will give us actionable data " =>
Awesome, that would be great to have data on that to help prioritize what
to fix!

I personally think if we disable most tests and move them to nightly tests,
we will decrease the trust and stability of the platform and it leaves the
door open to conflicting changes creating hard to debug failures. I think
the biggest potential win here is reducing test flakiness. That's the one
that is killing the productivity, we can redesign the test pipeline to run
integration and unit test in parallel and that would give us straight away
a 30 minutes reduced time in the CI run. Then we'd be always at <2h for a
build, which seems reasonable if it never fails for no reason.

Thomas

2018-06-07 8:27 GMT-07:00 Marco de Abreu <marco.g.ab...@googlemail.com>:

> Yeah, I think we are at the point at which we have to disable tests..
>
> If a test fails in nightly, the commit would not be reverted since it's
> hard to pin a failure to a specific PR. We will have reporting for failures
> on nightly (they have proven to be stable, so we can enable it right from
> the beginning). I'm also currently working on a system that tracks all test
> failures, so this will also cover nightly tests. This will give us
> actionable data which allows us to define acceptance criteria for a
> release. E.g. if the test success rate is below X%, a release can not be
> made. This could be set to 100% for nightly, for example.
>
> It would definitely be good if we could determine which tests are required
> to run and which ones are unnecessary. I don't really like the flag in the
> comment (and also it's hard to integrate). A good idea would be some
> analytics on the changed file content. If we have this data, we could
> easily enable and disable different jobs. Since this behaviour is entirely
> defined in GitHub, I'd like to invite everybody to submit a PR.
>
> -Marco
>
>
>
> On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham <aaron.s.mark...@gmail.com>
> wrote:
>
> > I'd like to disable flaky tests until they're fixed.
> > What would the process be for fixing a failure if the tests are done
> > nightly? Would the commit be reverted? Won't we end up in the same
> > situation with so many flaky tests?
> >
> > I'd like to see if we can separate the test pipelines based on the
> content
> > of the commit. I think that md, html, and js updates should fly through
> and
> > not have to go through GPU tests.
> >
> > Maybe some special flag added to the comment?
> > Is this possible?
> >
> >
> > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > Hi Team
> > >
> > > The time to validate a PR is growing, due to our number of supported
> > > platforms and increased time spent in testing and running models.  We
> are
> > > at approximately 3h for a full successful run.
> > >
> > > This is compounded with the failure rate of builds due to flaky tests
> of
> > > more than 50% which is a big drag in developer productivity if you can
> > only
> > > get one or two CI runs to a change per day.
> > >
> > > I would want a turnaround time of less than 30 minutes and 0% failure
> > rate
> > > on master.
> > >
> > > For this I propose working towards moving tests from CI to nightly,
> > > specially the ones that take most time or do black box testing with
> full
> > > training of models. And addressing flaky tests by either fixing them or
> > > disabling them.
> > >
> > > I would like to check if there's consensus on this previous plan so we
> > are
> > > aligned on pursuing this common goal as a shared effort.
> > >
> > > Pedro.
> > >
> >
>

Reply via email to