It's a classic problem: you can't string N things together serially and
expect high reliability.  5,000 tests in a row isn't going to give you a
bunch of 9's.  It feels to me that the test frameworks themselves should
support a more robust model -- like a way to tag a test as "retry me up to
N times before you really consider me a failure" or something like that.

Ron

On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <stanis...@confluent.io>
wrote:

> > We internally have an improvement for a half a year now which reruns the
> flaky test classes at the end of the test gradle task, lets you know that
> they were rerun and probably flaky. It fails the build only if the second
> run of the test class was also unsuccessful. I think it works pretty good,
> we mostly have green builds. If there is interest, I can try to contribute
> that.
>
> That does sound very intriguing. Does it rerun the test classes that failed
> or some known, marked classes? If it is the former, I can see a lot of
> value in having that automated in our PR builds. I wonder what others think
> of this
>
> On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Hey All,
> >
> > Thanks for the loads of ideas.
> >
> > @Stanislav, @Sonke
> > I probably left it out from my email but I really imagined this as a
> > case-by-case basis change. If we think that it wouldn't cause problems,
> > then it might be applied. That way we'd limit the blast radius somewhat.
> > The 1 hour gain is really just the most optimistic scenario, I'm almost
> > sure that not every test could be transformed to use a common cluster.
> > We internally have an improvement for a half a year now which reruns the
> > flaky test classes at the end of the test gradle task, lets you know that
> > they were rerun and probably flaky. It fails the build only if the second
> > run of the test class was also unsuccessful. I think it works pretty
> good,
> > we mostly have green builds. If there is interest, I can try to
> contribute
> > that.
> >
> > >I am also extremely annoyed at times by the amount of coffee I have to
> > drink before tests finish
> > Just please don't get a heart attack :)
> >
> > @Ron, @Colin
> > You bring up a very good point that it is easier and frees up more
> > resources if we just run change specific tests and it's good to know
> that a
> > similar solution (meaning using a shared resource for testing) have
> failed
> > elsewhere. I second Ron on the test categorization though, although as a
> > first attempt I think using a flaky retry + running only the necessary
> > tests would help in both time saving and effectiveness. Also it would be
> > easier to achieve.
> >
> > @Ismael
> > Yea, it'd be interesting to profile the startup/shutdown, I've never done
> > that. Perhaps I'll set some time apart for that :). It's definitely true
> > though that if we see a significant delay there we wouldn't just improve
> > the efficiency of the tests but also customer experience.
> >
> > Best,
> > Viktor
> >
> >
> >
> > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com> wrote:
> >
> > > It's an idea that has come up before and worth exploring eventually.
> > > However, I'd first try to optimize the server startup/shutdown process.
> > If
> > > we measure where the time is going, maybe some opportunities will
> present
> > > themselves.
> > >
> > > Ismael
> > >
> > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new
> cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the
> profiler
> > > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and
> 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > > creating
> > > > the cluster once per class and perform the tests on separate
> methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > > should
> > > > be independent but it seems like recreating clusters for each is a
> very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > > could
> > > > be a lot where it is.
> > > >
> > > > In the optimal case we could cut the testing time back by
> approximately
> > > an
> > > > hour. This would save resources and give quicker feedback for PR
> > builds.
> > > >
> > > > What are your thoughts?
> > > > Has anyone thought about this or were there any attempts made?
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Reply via email to