Hello everyone: I'm answering to both your emails here (Elena first, then Sergei)
On Thu, May 22, 2014 at 4:12 PM, Elena Stepanova <[email protected]> wrote: > > I suggest to stay with the terminology, for clarity. You are right. I'll stick to MTR terminology. But even on an ideal data set the mixed approach should still be most > efficient, so it should be okay to use it even if some day we fix all the > broken tests and collect reliable data. Yes, I agree. Keeping the Mixed (Branch/Platform) approach. >> 2. Include a new measure that increases relevancy: Time since last >> run. >> >> The relevancy index should have a component that makes the test more >> relevant the longer it spends not running >> > > I agree with the idea, but have doubts about the criteria. > I think you should measure not the time, but the number of test runs that > happened since the last time the test was run (it would be even better if > we could count the number of revisions, but that's probably not easy). > The reason is that some branches are very active, while others can be > extremely slow. So, with the same time-based coefficient the relevancy of a > test can strike between two consequent test runs just because they happened > with a month interval, but will be changing too slowly on a branch which > has a dozen of commits a day. > Yes. I agree with you on this. This is what I had in mind, but I couldn't express it properly on my email : ) > 3. Include also correlation. I still don't have a great idea of how >> >> correlation will be considered, but it's something like this: >> 1. The data contains the list of test_runs where each test_suite >> has >> >> failed. If two test suites have failed together a certain >> percentage of >> times (>30%?), then when test A fails, the relevancy test of test >> B also >> goes up... and when test A runs without failing, the relevancy >> test of test >> B goes down too. >> > > We'll need to see how it goes. > In real life correlation of this kind does exist, but I'd say much more > often related failures happen due to some environmental problems, so the > presumed correlation will be fake. Good point. Let's see how the numbers play out, but I think you are right that this will end up with a severe bias due to test blowups and failures due to environmental problems. > > I think in any case we'll have to rely on the fact that your script will > choose tests not from the whole universe of tests, but from an initial list > that MTR produces for this particular test run. That is, it will go > something like that: > - test run is started in buildbot; > - MTR collects test cases to run, according to the startup parameters, as > it always does; > - the list is passed to your script; > - the script filters it according to the algorithm that you developed, > keeps only a small portion of the initial list, and passes it back to MTR; > - MTR runs the requested tests. > > That is, you do exclusion of tests rather than inclusion. > > This will solve two problems: > - first test run: when a new test is added, only MTR knows about it, > buildbot doesn't; so, when MTR passes to you a test that you know nothing > about (and assuming that we do have a list of all executed tests in > buildbot), you'll know it's a new test and will act accordingly; > - abandoned tests: MTR just won't pass them to your script, so it won't > take them into account. Great. This is good to know, to have a more precise idea of how the project would fit into the MariaDB development. > On Thu, May 22, 2014 at 5:39 PM, Sergei Golubchik <[email protected]> wrote: > > > - *test_suite, test suite, test case* - When I say test suite or test > > case, I am referring to a single test file. For instance ' > > *pbxt.group_min_max*'. They are the ones that fail, and whose failures > > we want to attempt to predict. > > may I suggest to distinguish between a test *suite* and a test *case*? > the latter is usually a one test file, but a suite (for mtr) is a > directory with many test files. Like, "main", "pbxt", etc. > Right. I didn't define this properly. Let's keep the definitions exactly from MTR, as Elena suggested. I don't think you should introduce artificial limitations that make the > recall worse, because they "look realistic". > > You can do it realistic instead, not look realistic - simply pretend > that your code is already running on buildbot and limits the number of > tests to run. So, if the test didn't run - you don't have any failure > information about it. > > And then you only need to do what improves recall, nothing else :) > > (of course, to calculate the recall you need to use all failures, > even for tests that you didn't run) Yes, my code *already works this way*. It doesn't consider failure information from tests that were not supposed to run. The graphs that I sent are from scripts that ran like this. Of course, the recall is just the number of spotted failures from the 100% of known failures : ) Anyway, with all this, I will get to work on adapting the simulation a little bit: - Time since last run will also affect the relevancy of a test - I will try to use the list of changed files from commits to make sure new tests start running right away Any other comments are welcome. Regards Pablo
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

