Thanks Uwe. I think that guy actually refers to randomizing parameter/ variable space for testing, not generating tests at random in general. I agree with you that any type of generational/ mutational/ metaheuristic-driven tests generation is not a good way to go in practice. Lucene tests still have a very broad search space, even with just randomized parameters/ data and for this that blog post would apply (?).
Anyway, his is one argument in a broader discussion. I don't know which theoretical approach would be ideal but I know for sure even typical randomization of parameters reveals bugs pretty damn quickly... Dawid On Sun, Jun 24, 2012 at 5:53 PM, Uwe Schindler <[email protected]> wrote: > Hi Dawid, > > I think we do not do completely random testing at all! Random testing (with > the limits mentioned in the article) would mean that we do very random > testing with also randomly generated tests. In contrast our tests all have a > non-random background, means our tests are written with a specific use-case > in mind (there are some exemptions, I agree, see below). So with any random > parameter they basically test what a non-random junit test would also test. > In most cases the randomness is only added on top (to extend the space of > parameters we test) - so the chances to find bugs get greater. If we would > test Lucene only with also randomly generated tests, I agree we would find no > bugs at all. On the other hand, classical junit tests have no randomness at > all, they just test that a specific "requirement" is implemented correctly > and returns the correct result. The problem with those tests is the fact, > that by fixing a bug to do the correct thing in test can break something else > (this is especially important for IR tests, they rely on statistics!!!). You > could write a test that expects a specific output of a lucene query and you > can of course fix all the scoring logic to exactly create that result. Of > course any other similar query can return bogus results. So by executing > random queries, we can also test statistical things like ranking function > works also for a larger set of queries or larger number of documents. > > There are some exceptions: The famous TestRandomChains falls exactly in the > category you mention: They just build a random chain of analyzer components > and the number of combinations you could create is huge! With a few > Tokenizers and a list of 150 TokenFilters you can create an insane amount of > TokenStream combinations (just start to multiply..., it is the formula for > the binomial coefficient). In that case, the chance to find a new bug is > really small. The number of failures at the beginning really shows that the > TokenStreams were in a very bad state! Now we have only few failures (one > more today!), but I am sure there are still billions (ore more) of > combinations that break the TokenStream contracts! > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [email protected] > > >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of >> Dawid Weiss >> Sent: Sunday, June 24, 2012 3:42 PM >> To: [email protected] >> Cc: [email protected] >> Subject: Random testing performance (theoretical) >> >> An interesting link was given to me today: >> >> "The Central Limit Theorem Makes Random Testing Hard" >> http://blog.regehr.org/archives/660 >> >> Just recently I had a talk at a local university >> (http://idss.cs.put.poznan.pl/site/idss-en.html) and the low theoretical >> probability of hitting _any_ sensible bug was also raised as a concern. This >> is in >> stark contrast to what we see in practice, isn't it? It is an interesting >> phenomenon because it means that (among other possible explanations) the >> parameter/ search space of random tests is just ridden with failures and >> exceptions and bugs and even with close-to-zero probability we still hit a >> lot of >> them :) >> >> Dawid >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] For additional >> commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
