TLDR; I'm going to push https://issues.apache.org/jira/browse/SOLR-12027 in a day. Let me know if you think it's a bad idea.
On Fri, Feb 23, 2018 at 8:06 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Testing distributed systems requires, well, distributed systems which > is what starting clusters is all about. The great leap of faith of > individual-method unit testing is that if all the small parts are > tested, combining them in various ways will "just work". This is > emphatically not true with distributed systems. > > Which is also one of the reasons some of the tests are long. It takes > time (as you pointed out) to set up a cluster. So once a cluster is > started, testing a bunch of things amortizes the expense of setting up > the cluster. If each test of some bit of distributed functionality set > up and tore down a cluster, that would extend the time it takes to run > a full test suite by quite a bit. Note this is mostly a problem in > Solr, Lucene tests tend to run much faster. > > What Dawid said about randomness. All the randomization functions are > controlled by the "seed", that's what the "reproduce with" line in the > results is all about. That "controlled randomization" has uncovered > any number of bugs for obscure things that would have been vastly more > painful to discover otherwise. One example I remember went along the > lines of "this particular functionality is broken when op systems X > thinks it's in the Turkish locale". Which is _also_ why all tests must > use the framework random() method provided by LuceneTestCase and never > the Java random functions. > > For that matter, one _other_ problem uncovered by the randomness is > that tests in a suite are executed in different order with different > seeds, so side effects of one test method that would affect another > are flushed out. > > Mind you, this doesn't help with race conditions that are sensitive > to, say, the clock speed of the machine you're running on.... > > All that said, there's plenty of room for improving our tests. I'm > sure there are tests that spin up a cluster that don't need to. All > patches welcome of course. > > Best, > Erick > > > > On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <dawid.we...@gmail.com> > wrote: > >> Randomness makes it difficult to correlate a failure to the commit that > made > >> the test to fail (as was pointed out earlier in the discussion). If each > >> execution path is different, it may very well be that a failure you > >> experience is introduced several commits ago, so it may not be your > fault. > > > > This is true only to a certain degree. If you don't randomize all you > > do is essentially run a fixed scenario. This protects you against a > > regression in this particular state, but it doesn't help in > > discovering new corner cases or environment quirks, which would be > > prohibitive to run as a full Cartesian product of all possibilities. > > So there is a tradeoff here and most folks in this project have agreed > > to it. If you look at how many problems randomization have helped > > discover I think it's a good tradeoff. > > > > Finally: your scenario can be actually reproduced with ease. Run the > > tests with a fixed seed before you apply a patch and after you apply > > it... if there is no regression you can assume your patch is fine (but > > it doesn't mean it won't fail later on on a different seed, which > > nobody will blame you for). > > > > Dawid > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Sincerely yours Mikhail Khludnev