Re: Speeding up integration tests

Colin McCabe Wed, 27 Feb 2019 11:47:18 -0800

On Wed, Feb 27, 2019, at 10:02, Ron Dagostino wrote:
> Hi everyone.  Maybe providing the option to run it both ways -- start your
> own cluster vs. using one that is pre-started -- might be useful?  Don't
> know how it would work or if it would be useful, but it is something to
> think about.
> 
> Also, while the argument against using a pre-started cluster due to an
> expected increase in test flakiness is reasonable, it is conjecture and may
> not turn out to be correct; if it isn't too much trouble it might be worth
> it to actually see if the arguments is right since the decreased time
> benefit is less conjecture/more real.


It's not just a conjecture.  We tried sharing the same cluster for multiple 
test cases in Hadoop.  It often led to difficult to debug tests.  A test can 
change the cluster state in a subtle way that causes a following test to fail.  
The ordering that tests get run in JUnit is also random, which makes this even 
more frustrating for developers.

An easier solution to the problem of long test run times is to run only the 
tests that are affected by a particular change.  For example, if you make a 
change to Connect, you shouldn't need to re-run the tests for Clients, since 
your change doesn't impact any of that code.  We had this set up in Hadoop, and 
it helped free up a lot of Jenkins time.

best,
Colin

> 
> Ron
> 
> On Wed, Feb 27, 2019 at 10:39 AM Sönke Liebau
> <soenke.lie...@opencore.com.invalid> wrote:
> 
> > Hi,
> >
> > while I am also extremely annoyed at times by the amount of coffee I
> > have to drink before tests finish I think the argument about flaky
> > tests is valid! The current setup has the benefit that every test case
> > runs on a pristine cluster, if we changed this we'd need to go through
> > all tests and ensure that topic names are different, which can
> > probably be abstracted to include a timestamp in the name or something
> > like that, but it is an additional failure potential.
> > Add to this the fact that "JUnit runs tests using a deterministic, but
> > unpredictable order" and the water gets even muddier. Potentially this
> > might mean that adding an additional test case changes the order that
> > existing test cases are executed in which might mean that all of a
> > sudden something breaks that you didn't even touch.
> >
> > Best regards,
> > Sönke
> >
> >
> > On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
> > <stanis...@confluent.io> wrote:
> > >
> > > Hey Viktor,
> > >
> > > I am all up for the idea of speeding up the tests. Running the
> > > `:core:integrationTest` command takes an absurd amount of time as is and
> > is
> > > continuously going to go up if we don't do anything about it.
> > > Having said that, I am very scared that your proposal might significantly
> > > increase the test flakiness of current and future tests - test flakiness
> > is
> > > a huge problem we're battling. We don't get green PR builds too often -
> > it
> > > is very common that one or two flaky tests fail in each PR.
> > > We have also found it hard to get a green build for the 2.2 release (
> > > https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
> > >
> > > On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com> wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I've been observing lately that unit tests usually take 2.5 hours to
> > run
> > > > and a very big portion of these are the core tests where a new cluster
> > is
> > > > spun up for every test. This takes most of the time. I ran a test
> > > > (TopicCommandWithAdminClient with 38 test inside) through the profiler
> > and
> > > > it shows for instance that running the whole class itself took 10
> > minutes
> > > > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > > > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > > > seconds, so the useful time would be between 3-4 minutes. This is a
> > bigger
> > > > test though, most of them won't take this much.
> > > > There are 74 classes that implement KafkaServerTestHarness and just
> > running
> > > > :core:integrationTest takes almost 2 hours.
> > > >
> > > > I think we could greatly speed up these integration tests by just
> > creating
> > > > the cluster once per class and perform the tests on separate methods. I
> > > > know that this a little bit contradicts to the principle that tests
> > should
> > > > be independent but it seems like recreating clusters for each is a very
> > > > expensive operation. Also if the tests are acting on different
> > resources
> > > > (different topics, etc.) then it might not hurt their independence.
> > There
> > > > might be cases of course where this is not possible but I think there
> > could
> > > > be a lot where it is.
> > > >
> > > > In the optimal case we could cut the testing time back by
> > approximately an
> > > > hour. This would save resources and give quicker feedback for PR
> > builds.
> > > >
> > > > What are your thoughts?
> > > > Has anyone thought about this or were there any attempts made?
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> >
>

Re: Speeding up integration tests

Reply via email to