Testing distributed systems requires, well, distributed systems which is what starting clusters is all about. The great leap of faith of individual-method unit testing is that if all the small parts are tested, combining them in various ways will "just work". This is emphatically not true with distributed systems.
Which is also one of the reasons some of the tests are long. It takes time (as you pointed out) to set up a cluster. So once a cluster is started, testing a bunch of things amortizes the expense of setting up the cluster. If each test of some bit of distributed functionality set up and tore down a cluster, that would extend the time it takes to run a full test suite by quite a bit. Note this is mostly a problem in Solr, Lucene tests tend to run much faster. What Dawid said about randomness. All the randomization functions are controlled by the "seed", that's what the "reproduce with" line in the results is all about. That "controlled randomization" has uncovered any number of bugs for obscure things that would have been vastly more painful to discover otherwise. One example I remember went along the lines of "this particular functionality is broken when op systems X thinks it's in the Turkish locale". Which is _also_ why all tests must use the framework random() method provided by LuceneTestCase and never the Java random functions. For that matter, one _other_ problem uncovered by the randomness is that tests in a suite are executed in different order with different seeds, so side effects of one test method that would affect another are flushed out. Mind you, this doesn't help with race conditions that are sensitive to, say, the clock speed of the machine you're running on.... All that said, there's plenty of room for improving our tests. I'm sure there are tests that spin up a cluster that don't need to. All patches welcome of course. Best, Erick On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: >> Randomness makes it difficult to correlate a failure to the commit that made >> the test to fail (as was pointed out earlier in the discussion). If each >> execution path is different, it may very well be that a failure you >> experience is introduced several commits ago, so it may not be your fault. > > This is true only to a certain degree. If you don't randomize all you > do is essentially run a fixed scenario. This protects you against a > regression in this particular state, but it doesn't help in > discovering new corner cases or environment quirks, which would be > prohibitive to run as a full Cartesian product of all possibilities. > So there is a tradeoff here and most folks in this project have agreed > to it. If you look at how many problems randomization have helped > discover I think it's a good tradeoff. > > Finally: your scenario can be actually reproduced with ease. Run the > tests with a fixed seed before you apply a patch and after you apply > it... if there is no regression you can assume your patch is fine (but > it doesn't mean it won't fail later on on a different seed, which > nobody will blame you for). > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org