It's a classic problem: you can't string N things together serially and expect high reliability. 5,000 tests in a row isn't going to give you a bunch of 9's. It feels to me that the test frameworks themselves should support a more robust model -- like a way to tag a test as "retry me up to N times before you really consider me a failure" or something like that.
Ron On Fri, Mar 8, 2019 at 11:40 AM Stanislav Kozlovski <stanis...@confluent.io> wrote: > > We internally have an improvement for a half a year now which reruns the > flaky test classes at the end of the test gradle task, lets you know that > they were rerun and probably flaky. It fails the build only if the second > run of the test class was also unsuccessful. I think it works pretty good, > we mostly have green builds. If there is interest, I can try to contribute > that. > > That does sound very intriguing. Does it rerun the test classes that failed > or some known, marked classes? If it is the former, I can see a lot of > value in having that automated in our PR builds. I wonder what others think > of this > > On Thu, Feb 28, 2019 at 6:04 PM Viktor Somogyi-Vass < > viktorsomo...@gmail.com> > wrote: > > > Hey All, > > > > Thanks for the loads of ideas. > > > > @Stanislav, @Sonke > > I probably left it out from my email but I really imagined this as a > > case-by-case basis change. If we think that it wouldn't cause problems, > > then it might be applied. That way we'd limit the blast radius somewhat. > > The 1 hour gain is really just the most optimistic scenario, I'm almost > > sure that not every test could be transformed to use a common cluster. > > We internally have an improvement for a half a year now which reruns the > > flaky test classes at the end of the test gradle task, lets you know that > > they were rerun and probably flaky. It fails the build only if the second > > run of the test class was also unsuccessful. I think it works pretty > good, > > we mostly have green builds. If there is interest, I can try to > contribute > > that. > > > > >I am also extremely annoyed at times by the amount of coffee I have to > > drink before tests finish > > Just please don't get a heart attack :) > > > > @Ron, @Colin > > You bring up a very good point that it is easier and frees up more > > resources if we just run change specific tests and it's good to know > that a > > similar solution (meaning using a shared resource for testing) have > failed > > elsewhere. I second Ron on the test categorization though, although as a > > first attempt I think using a flaky retry + running only the necessary > > tests would help in both time saving and effectiveness. Also it would be > > easier to achieve. > > > > @Ismael > > Yea, it'd be interesting to profile the startup/shutdown, I've never done > > that. Perhaps I'll set some time apart for that :). It's definitely true > > though that if we see a significant delay there we wouldn't just improve > > the efficiency of the tests but also customer experience. > > > > Best, > > Viktor > > > > > > > > On Thu, Feb 28, 2019 at 8:12 AM Ismael Juma <isma...@gmail.com> wrote: > > > > > It's an idea that has come up before and worth exploring eventually. > > > However, I'd first try to optimize the server startup/shutdown process. > > If > > > we measure where the time is going, maybe some opportunities will > present > > > themselves. > > > > > > Ismael > > > > > > On Wed, Feb 27, 2019, 3:09 AM Viktor Somogyi-Vass < > > viktorsomo...@gmail.com > > > > > > > wrote: > > > > > > > Hi Folks, > > > > > > > > I've been observing lately that unit tests usually take 2.5 hours to > > run > > > > and a very big portion of these are the core tests where a new > cluster > > is > > > > spun up for every test. This takes most of the time. I ran a test > > > > (TopicCommandWithAdminClient with 38 test inside) through the > profiler > > > and > > > > it shows for instance that running the whole class itself took 10 > > minutes > > > > and 37 seconds where the useful time was 5 minutes 18 seconds. > That's a > > > > 100% overhead. Without profiler the whole class takes 7 minutes and > 48 > > > > seconds, so the useful time would be between 3-4 minutes. This is a > > > bigger > > > > test though, most of them won't take this much. > > > > There are 74 classes that implement KafkaServerTestHarness and just > > > running > > > > :core:integrationTest takes almost 2 hours. > > > > > > > > I think we could greatly speed up these integration tests by just > > > creating > > > > the cluster once per class and perform the tests on separate > methods. I > > > > know that this a little bit contradicts to the principle that tests > > > should > > > > be independent but it seems like recreating clusters for each is a > very > > > > expensive operation. Also if the tests are acting on different > > resources > > > > (different topics, etc.) then it might not hurt their independence. > > There > > > > might be cases of course where this is not possible but I think there > > > could > > > > be a lot where it is. > > > > > > > > In the optimal case we could cut the testing time back by > approximately > > > an > > > > hour. This would save resources and give quicker feedback for PR > > builds. > > > > > > > > What are your thoughts? > > > > Has anyone thought about this or were there any attempts made? > > > > > > > > Best, > > > > Viktor > > > > > > > > > > > > -- > Best, > Stanislav >