Hi

Thanks for your insightful comments. I see the concerns about moving PR
checks to nightly and it worries me too. We should all agree on a tradeoff
of some sorts. Flaky tests not only do not help validate the code, but are
detrimental to our progress. Agree that disabling them is suboptimal but
necessary at this point. A flaky test not only doesn't fulfill its purpose
but it harms everyone's productivity.

I have just sent a fix for one of those insidious flaky tests:
https://github.com/apache/incubator-mxnet/pull/11259
I encourage you to adopt one of those tests and try to find why it fails.
Often it's quite interesting puzzle to dig, since the problem is often not
obvious as in that case which has eluded us for a long time.

@Thomas we are adding ccache support to try to speed up build times and
evaluating if it helps, and by how much.

Also compounded to the flaky tests is the fact that we are doing too many
heavy things like training lengthy models in all different flavors and
packages, downloading a plethora of external datasets and dependencies and
more.  It's just so many things doing serially, that by simple
probability,  the chance of something going wrong somewhere is just
happening too often, and this adds another 2h hours for getting feedback on
your changes.

We need an in depth review of what we should do in CI and what in nightly,
agree then execute.

Pedro.



On Fri, Jun 8, 2018 at 3:43 AM Steffen Rochel <steffenroc...@gmail.com>
wrote:

> I support to create github/Jira for flaky tests and disable for now.
> However, we need to get serious and prioritize fixing the disabled tests.
> Making PR checks smart and test only code impacted by change is a good
> idea, anybody has experience with tools enabling smart validation?
>
> I'm concerned about moving tests from PR check to nightly. We should review
> which are long running tests and evaluate options to reduce runtime. We
> must prevent bad code to get into the repo through automated process as it
> will be cumbersome to find out which PR cause failures or performance
> degradation in nighty regression. Unless we get code coverage measurements
> in place it is difficult to decide which tests are redundant and can either
> be removed or moved to nightly.
> Is the PR validation turn around time the bottleneck? I see a number of PR
> waiting for reviews.
> Steffen
>
> On Thu, Jun 7, 2018 at 6:00 PM Naveen Swamy <mnnav...@gmail.com> wrote:
>
> > Sorry, I missed reading that Pedro was asking to move the tests that run
> > training. I agree with that.
> >
> > Additionally we should make the CI smart as I mentioned above.
> >
> > -Naveen
> >
> >
> > On Thu, Jun 7, 2018 at 3:59 PM, Naveen Swamy <mnnav...@gmail.com> wrote:
> >
> > > -1 for moving to nightly. I think that would be detrimental.
> > >
> > > We have to make our CI a little more smart and only build required
> > > components and not build all components to reduce cost and the time it
> > > takes to run CI. A Scala build need not build everything and run tests
> > > related to Python, etc.,
> > >
> > > Thanks, Naveen
> > >
> > > On Thu, Jun 7, 2018 at 9:57 AM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com> wrote:
> > >
> > >> Thanks a lot for our input, Thomas! You are right, 3h are only hit if
> > >> somebody makes changes in their Dockerfiles and thus every node has to
> > >> rebuild their containers - but this is expected and inevitable.
> > >>
> > >> So far there have not been any big attempts to resolve the number of
> > flaky
> > >> tests. We had a few people fixing some tests (and that's very
> > >> appreciated!!!), but it feels like we're introducing more than we can
> > fix.
> > >> I'd definitely love a flaky-test-bash-week and proposed it a few
> times,
> > >> but
> > >> there was no success so far unfortunately.
> > >>
> > >> We will definitely not drop any platforms. What we will probably do
> > after
> > >> the nightly tests are in place, is move some things like CentOS or
> > >> overlapping Python2/Python3 tests to nightly. We don't need to test
> > >> python2
> > >> and python3 compatibility half a dozen times on different platforms
> for
> > >> every commit.
> > >>
> > >> I've been thinking about merging the integration and unit test state
> and
> > >> I'm pretty tempted to do it. My only concern so far was the increased
> > >> cost.
> > >> I expect the nightly tests to be in place in about 2 weeks. I'd
> propose
> > we
> > >> wait until then and then revisit which runs are actually required for
> > >> every
> > >> PR and which ones can be moved. During that process, we will probably
> > >> consolidate a lot of tests and put them into one stage.
> > >>
> > >> But I agree, past has shown that disabling tests did only mask the
> > problem
> > >> and won't get them fixed. Also, quite a lot of failures have proven to
> > be
> > >> actual bugs in our code. So from a customer perspective, we should
> > >> actually
> > >> give these failures a high priority. I hope they will get into the
> > >> spotlight after I provide the proper statistics.
> > >>
> > >> -Marco
> > >>
> > >>
> > >> On Thu, Jun 7, 2018 at 6:35 PM Thomas DELTEIL <
> > thomas.delte...@gmail.com>
> > >> wrote:
> > >>
> > >> > Thanks for bringing the issue of CI stability!
> > >> >
> > >> > However I disagree with some points in this thread:
> > >> >
> > >> > - "We are at approximately 3h for a full successful run."
> > >> > => Looking at Jenkins I see the last successful runs oscillating
> > between
> > >> > 1h53 and 2h42 with a mean that seems to be at 2h20. Or are you
> talking
> > >> > about something different than the jenkins CI run?
> > >> >
> > >> > - "For this I propose working towards moving tests from CI to
> nightly,
> > >> > specially
> > >> > the ones that take most time or do black box testing with full
> > training
> > >> of
> > >> > models. And addressing flaky tests by either fixing them or
> *disabling
> > >> > them." *
> > >> > => Is there any evidence that some serious effort has been spent
> > trying
> > >> to
> > >> > fix the flaky tests? I know Sheng and Marco have worked to
> > >> consolidating a
> > >> > list of Flaky tests, but I think simply disabling tests will just
> make
> > >> the
> > >> > platform weaker. Let's organize a flaky test week where we each take
> > on
> > >> a
> > >> > couple of these flaky tests and hopefully we should make good
> progress
> > >> > towards stabilizing the CI.
> > >> >
> > >> > -"I'd like to disable flaky tests until they're fixed."
> > >> > => Wishful thinking IMO, we know this never happens, if we can't
> make
> > >> time
> > >> > now to fix them, we'll never go back and fix them.
> > >> >
> > >> > "I would want a turnaround time of less than 30 minutes and 0%
> failure
> > >> rate
> > >> > on master."
> > >> >  => With current timing, this means barely finishing the build step.
> > Do
> > >> we
> > >> > propose dropping some platforms for building?
> > >> >
> > >> > I agree with some points:
> > >> >
> > >> > "Won't we end up in the same situation with so many flaky tests?" =>
> > >> pretty
> > >> > sure it will
> > >> > "This could be set to 100% for nightly, for example."[for the
> release]
> > >> =>
> > >> > That would be a given to me
> > >> > "I'm also currently working on a system that tracks all test
> failures,
> > >> so
> > >> > this will also cover nightly tests. This will give us actionable
> data
> > "
> > >> =>
> > >> > Awesome, that would be great to have data on that to help prioritize
> > >> what
> > >> > to fix!
> > >> >
> > >> > I personally think if we disable most tests and move them to nightly
> > >> tests,
> > >> > we will decrease the trust and stability of the platform and it
> leaves
> > >> the
> > >> > door open to conflicting changes creating hard to debug failures. I
> > >> think
> > >> > the biggest potential win here is reducing test flakiness. That's
> the
> > >> one
> > >> > that is killing the productivity, we can redesign the test pipeline
> to
> > >> run
> > >> > integration and unit test in parallel and that would give us
> straight
> > >> away
> > >> > a 30 minutes reduced time in the CI run. Then we'd be always at <2h
> > for
> > >> a
> > >> > build, which seems reasonable if it never fails for no reason.
> > >> >
> > >> > Thomas
> > >> >
> > >> > 2018-06-07 8:27 GMT-07:00 Marco de Abreu <
> > marco.g.ab...@googlemail.com>
> > >> :
> > >> >
> > >> > > Yeah, I think we are at the point at which we have to disable
> > tests..
> > >> > >
> > >> > > If a test fails in nightly, the commit would not be reverted since
> > >> it's
> > >> > > hard to pin a failure to a specific PR. We will have reporting for
> > >> > failures
> > >> > > on nightly (they have proven to be stable, so we can enable it
> right
> > >> from
> > >> > > the beginning). I'm also currently working on a system that tracks
> > all
> > >> > test
> > >> > > failures, so this will also cover nightly tests. This will give us
> > >> > > actionable data which allows us to define acceptance criteria for
> a
> > >> > > release. E.g. if the test success rate is below X%, a release can
> > not
> > >> be
> > >> > > made. This could be set to 100% for nightly, for example.
> > >> > >
> > >> > > It would definitely be good if we could determine which tests are
> > >> > required
> > >> > > to run and which ones are unnecessary. I don't really like the
> flag
> > in
> > >> > the
> > >> > > comment (and also it's hard to integrate). A good idea would be
> some
> > >> > > analytics on the changed file content. If we have this data, we
> > could
> > >> > > easily enable and disable different jobs. Since this behaviour is
> > >> > entirely
> > >> > > defined in GitHub, I'd like to invite everybody to submit a PR.
> > >> > >
> > >> > > -Marco
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Jun 7, 2018 at 5:20 PM Aaron Markham <
> > >> aaron.s.mark...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > I'd like to disable flaky tests until they're fixed.
> > >> > > > What would the process be for fixing a failure if the tests are
> > done
> > >> > > > nightly? Would the commit be reverted? Won't we end up in the
> same
> > >> > > > situation with so many flaky tests?
> > >> > > >
> > >> > > > I'd like to see if we can separate the test pipelines based on
> the
> > >> > > content
> > >> > > > of the commit. I think that md, html, and js updates should fly
> > >> through
> > >> > > and
> > >> > > > not have to go through GPU tests.
> > >> > > >
> > >> > > > Maybe some special flag added to the comment?
> > >> > > > Is this possible?
> > >> > > >
> > >> > > >
> > >> > > > On Wed, Jun 6, 2018 at 10:37 PM, Pedro Larroy <
> > >> > > > pedro.larroy.li...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Team
> > >> > > > >
> > >> > > > > The time to validate a PR is growing, due to our number of
> > >> supported
> > >> > > > > platforms and increased time spent in testing and running
> > >> models.  We
> > >> > > are
> > >> > > > > at approximately 3h for a full successful run.
> > >> > > > >
> > >> > > > > This is compounded with the failure rate of builds due to
> flaky
> > >> tests
> > >> > > of
> > >> > > > > more than 50% which is a big drag in developer productivity if
> > you
> > >> > can
> > >> > > > only
> > >> > > > > get one or two CI runs to a change per day.
> > >> > > > >
> > >> > > > > I would want a turnaround time of less than 30 minutes and 0%
> > >> failure
> > >> > > > rate
> > >> > > > > on master.
> > >> > > > >
> > >> > > > > For this I propose working towards moving tests from CI to
> > >> nightly,
> > >> > > > > specially the ones that take most time or do black box testing
> > >> with
> > >> > > full
> > >> > > > > training of models. And addressing flaky tests by either
> fixing
> > >> them
> > >> > or
> > >> > > > > disabling them.
> > >> > > > >
> > >> > > > > I would like to check if there's consensus on this previous
> plan
> > >> so
> > >> > we
> > >> > > > are
> > >> > > > > aligned on pursuing this common goal as a shared effort.
> > >> > > > >
> > >> > > > > Pedro.
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to