Hi All,

I've created a PR for what we have internally for retrying flaky tests. Any
reviews and ideas are welcome: https://github.com/apache/kafka/pull/6506
It's basically collects the failed classes and reruns them at the end. If
they successful it overwrites the test report.

Thanks,
Viktor

On Mon, Mar 11, 2019 at 1:05 PM Stanislav Kozlovski <stanis...@confluent.io>
wrote:

> I agree with Ron.
> I think improving the framework with a configurable number of retries on
> some tests will yield the highest ROI in terms of passing builds.
>
> On Fri, Mar 8, 2019 at 10:48 PM Ron Dagostino <rndg...@gmail.com> wrote:
>
> > It's a classic problem: you can't string N things together serially and
> > expect high reliability.  5,000 tests in a row isn't going to give you a
> > bunch of 9's.  It feels to me that the test frameworks themselves should
> > support a more robust model -- like a way to tag a test as "retry me up
> to
> > N times before you really consider me a failure" or something like that.
> >
> > Ron
> >
> > On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <
> > stanis...@confluent.io>
> > wrote:
> >
> > > > We internally have an improvement for a half a year now which reruns
> > the
> > > flaky test classes at the end of the test gradle task, lets you know
> that
> > > they were rerun and probably flaky. It fails the build only if the
> second
> > > run of the test class was also unsuccessful. I think it works pretty
> > good,
> > > we mostly have green builds. If there is interest, I can try to
> > contribute
> > > that.
> > >
> > > That does sound very intriguing. Does it rerun the test classes that
> > failed
> > > or some known, marked classes? If it is the former, I can see a lot of
> > > value in having that automated in our PR builds. I wonder what others
> > think
> > > of this
> > >
> > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hey All,
> > > >
> > > > Thanks for the loads of ideas.
> > > >
> > > > @Stanislav, @Sonke
> > > > I probably left it out from my email but I really imagined this as a
> > > > case-by-case basis change. If we think that it wouldn't cause
> problems,
> > > > then it might be applied. That way we'd limit the blast radius
> > somewhat.
> > > > The 1 hour gain is really just the most optimistic scenario, I'm
> almost
> > > > sure that not every test could be transformed to use a common
> cluster.
> > > > We internally have an improvement for a half a year now which reruns
> > the
> > > > flaky test classes at the end of the test gradle task, lets you know
> > that
> > > > they were rerun and probably flaky. It fails the build only if the
> > second
> > > > run of the test class was also unsuccessful. I think it works pretty
> > > good,
> > > > we mostly have green builds. If there is interest, I can try to
> > > contribute
> > > > that.
> > > >
> > > > >I am also extremely annoyed at times by the amount of coffee I have
> to
> > > > drink before tests finish
> > > > Just please don't get a heart attack :)
> > > >
> > > > @Ron, @Colin
> > > > You bring up a very good point that it is easier and frees up more
> > > > resources if we just run change specific tests and it's good to know
> > > that a
> > > > similar solution (meaning using a shared resource for testing) have
> > > failed
> > > > elsewhere. I second Ron on the test categorization though, although
> as
> > a
> > > > first attempt I think using a flaky retry + running only the
> necessary
> > > > tests would help in both time saving and effectiveness. Also it would
> > be
> > > > easier to achieve.
> > > >
> > > > @Ismael
> > > > Yea, it'd be interesting to profile the startup/shutdown, I've never
> > done
> > > > that. Perhaps I'll set some time apart for that :). It's definitely
> > true
> > > > though that if we see a significant delay there we wouldn't just
> > improve
> > > > the efficiency of the tests but also customer experience.
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > > >
> > > >
> > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com>
> wrote:
> > > >
> > > > > It's an idea that has come up before and worth exploring
> eventually.
> > > > > However, I'd first try to optimize the server startup/shutdown
> > process.
> > > > If
> > > > > we measure where the time is going, maybe some opportunities will
> > > present
> > > > > themselves.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Folks,
> > > > > >
> > > > > > I've been observing lately that unit tests usually take 2.5 hours
> > to
> > > > run
> > > > > > and a very big portion of these are the core tests where a new
> > > cluster
> > > > is
> > > > > > spun up for every test. This takes most of the time. I ran a test
> > > > > > (TopicCommandWithAdminClient with 38 test inside) through the
> > > profiler
> > > > > and
> > > > > > it shows for instance that running the whole class itself took 10
> > > > minutes
> > > > > > and 37 seconds where the useful time was 5 minutes 18 seconds.
> > > That's a
> > > > > > 100% overhead. Without profiler the whole class takes 7 minutes
> and
> > > 48
> > > > > > seconds, so the useful time would be between 3-4 minutes. This
> is a
> > > > > bigger
> > > > > > test though, most of them won't take this much.
> > > > > > There are 74 classes that implement KafkaServerTestHarness and
> just
> > > > > running
> > > > > > :core:integrationTest takes almost 2 hours.
> > > > > >
> > > > > > I think we could greatly speed up these integration tests by just
> > > > > creating
> > > > > > the cluster once per class and perform the tests on separate
> > > methods. I
> > > > > > know that this a little bit contradicts to the principle that
> tests
> > > > > should
> > > > > > be independent but it seems like recreating clusters for each is
> a
> > > very
> > > > > > expensive operation. Also if the tests are acting on different
> > > > resources
> > > > > > (different topics, etc.) then it might not hurt their
> independence.
> > > > There
> > > > > > might be cases of course where this is not possible but I think
> > there
> > > > > could
> > > > > > be a lot where it is.
> > > > > >
> > > > > > In the optimal case we could cut the testing time back by
> > > approximately
> > > > > an
> > > > > > hour. This would save resources and give quicker feedback for PR
> > > > builds.
> > > > > >
> > > > > > What are your thoughts?
> > > > > > Has anyone thought about this or were there any attempts made?
> > > > > >
> > > > > > Best,
> > > > > > Viktor
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
>
>
> --
> Best,
> Stanislav
>

Reply via email to