Re: Test failures are out of control......

Mikhail Khludnev Sun, 25 Feb 2018 12:55:44 -0800

TLDR;
I'm going to push https://issues.apache.org/jira/browse/SOLR-12027 in a
day.
Let me know if you think it's a bad idea.


On Fri, Feb 23, 2018 at 8:06 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Testing distributed systems requires, well, distributed systems which
> is what starting clusters is all about. The great leap of faith of
> individual-method unit testing is that if all the small parts are
> tested, combining them in various ways will "just work". This is
> emphatically not true with distributed systems.
>
> Which is also one of the reasons some of the tests are long. It takes
> time (as you pointed out) to set up a cluster. So once a cluster is
> started, testing a bunch of things amortizes the expense of setting up
> the cluster. If each test of some bit of distributed functionality set
> up and tore down a cluster, that would extend the time it takes to run
> a full test suite by quite a bit. Note this is mostly a problem in
> Solr, Lucene tests tend to run much faster.
>
> What Dawid said about randomness. All the randomization functions are
> controlled by the "seed", that's what the "reproduce with" line in the
> results is all about.  That "controlled randomization" has uncovered
> any number of bugs for obscure things that would have been vastly more
> painful to discover otherwise. One example I remember went along the
> lines of "this particular functionality is broken when op systems X
> thinks it's in the Turkish locale". Which is _also_ why all tests must
> use the framework random() method provided by LuceneTestCase and never
> the Java random functions.
>
> For that matter, one _other_ problem uncovered by the randomness is
> that tests in a suite are executed in different order with different
> seeds, so side effects of one test method that would affect another
> are flushed out.
>
> Mind you, this doesn't help with race conditions that are sensitive
> to, say, the clock speed of the machine you're running on....
>
> All that said, there's plenty of room for improving our tests. I'm
> sure there are tests that spin up a cluster that don't need to.  All
> patches welcome of course.
>
> Best,
> Erick
>
>
>
> On Fri, Feb 23, 2018 at 8:20 AM, Dawid Weiss <dawid.we...@gmail.com>
> wrote:
> >> Randomness makes it difficult to correlate a failure to the commit that
> made
> >> the test to fail (as was pointed out earlier in the discussion). If each
> >> execution path is different, it may very well be that a failure you
> >> experience is introduced several commits ago, so it may not be your
> fault.
> >
> > This is true only to a certain degree. If you  don't randomize all you
> > do is essentially run a fixed scenario. This protects you against a
> > regression in this particular state, but it doesn't help in
> > discovering new corner cases or environment quirks, which would be
> > prohibitive to run as a full Cartesian product of all possibilities.
> > So there is a tradeoff here and most folks in this project have agreed
> > to it. If you look at how many problems randomization have helped
> > discover I think it's a good tradeoff.
> >
> > Finally: your scenario can be actually reproduced with ease. Run the
> > tests with a fixed seed before you apply a patch and after you apply
> > it... if there is no regression you can assume your patch is fine (but
> > it doesn't mean it won't fail later on on a different seed, which
> > nobody will blame you for).
> >
> > Dawid
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Test failures are out of control......

Reply via email to