This all sounds great. I'm really happy to hear there's a lot of good work and thought going into making the testing better.
I think there's at least a small ISA verification suite in I think qemu (maybe?) which I used for verifying some of the SSE stuff. I think it tested things we don't implement and likely wouldn't want to bother with (virtual 8086 mode, for instance) but something like that would be good to throw in there. I think it might be difficult to find something like that that's really good since it's a pretty niche sort of thing, but there are other simulators/emulators out there, and I have to imagine somebody has needed something like that before. I'm certain the various CPU vendors have their own verification suites since they couldn't in good conscience sell CPUs otherwise, but those are likely considered a competitive advantage and not be available for use in gem5. Gabe On Thu, Dec 4, 2014 at 3:12 AM, Ali Saidi via gem5-dev <[email protected]> wrote: > Hi Gabe/Steve, > > As Steve mentioned I¹ve been working on a new take on the regression > system in my spare-time. > > I want to get it a bit more complete before I show it to the world, but > some of the goals I set out with align with yours. > > In particular I¹m targeting the following: > (1) Success/Failure being more that stats or Œgem5 terminated because of > ...¹. I have a range of criteria that can be specified by test from > finding certain output on the terminal, in the stdout/stderr files, stats, > etc. This code is easily extensible so other criteria could be easily > added. > (2) Adding a test should be easy. I think we should be testing our config > files along with gem5 not a custom set of configs, so the regression > tester takes command lines to gem5 that should be tested. > (3) Tests should be able to be selected on many criteria. Right now we > tests architectures and fs/se separately but everything else is batched > together. It¹s annoyingly hard to specify requirements like, ³test o3 cpu > on all architectures² on ³just run ruby tests.² > (4) Allow simulation on a cluster/cloud if the resources are available to > the user, but still work on a single machine. This at least provides more > parallelism in running the regression tests. > (5) Support for dependent tests. The notion that test A must be run and > generate output before test B can be run is really powerful and lets you > create tests that generate checkpoints and restore from them or only > restore from them which would let us get to the interesting parts of tests > much more quickly. > (6) Much better reporting of tests, being able to output differences > between different regression runs, etc. > > > I view some of the points your brought up as orthogonal to this, although > still very important. > > Much better unit testing would be great, although I think this is a > separate kind of testing from what we traditionally do for regression > tests. I spent a long time trying to bend various unit test frameworks > into something that we could use to run regression tests and they all had > issues in doing so. The question is how can we sensibly unit test some of > these components without spending huge amounts of effort mocking up > interfaces? Something that we¹ve done which has improved the situation a > bit is NoISA tests for the memory system. This is not a unit-test but more > of a directed-test. Isolating the memory system and coupling it with a > memory checker we¹re about to post should give better indication of > correctness in the memory system. A good next goal here is to hand create > some of the tricker cases in the memory system as traces we can replay and > hopefully verify that they work very quickly instead of relying on a few > CPUs to hopefully generate that case over millions or billions of > simulated instructions. > > Similarly, we should be able to do much better than the SPEC benchmarks we > run today for regression tests. You¹re right that ideally we don¹t need to > run the same loop a million times in a test. Any ideas where we could get > something that was better? Preferably something that has self-checking > code, freely distributable and cover many instructions. I wonder if LLVM > or gcc has a repository of test code that might be useful? > > Finally, I agree that a CI system that could automatically run regressions > when a patch was posted would be a huge improvement from where we are > today. > > Thanks, > > Ali > > > > > On 12/4/14, 7:56 AM, "Gabe Black via gem5-dev" <[email protected]> wrote: > > >What I'd like is for us not to use stats as a pass/fail criteria. I'm not > >sure how that would work, but using the stats is pretty fragile and hard > >to > >maintain. It's tricky because you want to make sure the stats themselves > >are still correct, but there are lots of "correct" stats which are > >different. I agree that automatically deciding how much stats should > >change > >is not feasible. I haven't had a chance to read that wiki page, but one > >thing I remember, perhaps from the last time this came up, is that the > >regressions we run are benchmarks that do the same thing many times to get > >steady state behavior. To verify something is correct, we don't need to > >loop over the same block of code thousands or millions of times. We could > >probably make things a lot faster without losing coverage that way, > >although changing the regression binaries wouldn't necessarily be very > >straightforward. > > > >While I think there are significant drawbacks from long running tests as > >detailed in my earlier email, there are benefits for really quick tests > >too. They could, for instance, be run automatically on every CL as part of > >a continuous integration system. It would also be practical to run all of > >them before sending a CL out. Right now I just take a best guess what > >regressions are worth running since running the long ones is a major time > >commitment, especially across all the ISAs. > > > >Gabe > > > >On Wed, Dec 3, 2014 at 9:58 PM, Steve Reinhardt via gem5-dev < > >[email protected]> wrote: > > > >> Hi Gabe, > >> > >> There's a long history here; I think everyone agrees the status quo wrt > >> testing is inadequate, but there are a lot of different needs as well. > >>I > >> won't go into a lot of detail, but there is a wiki page left over from > >>our > >> last attempt: http://gem5.org/NewRegressionFramework. Actually I see > >>now > >> that you contributed to an early version of that. > >> > >> I'm not opposed to us having more unit tests and a framework to run > >>them. > >> Having the ability to integrate unit tests into the regressions would > >>be a > >> good goal for a new regression system. > >> > >> Having better unit tests might provide a nice middle ground between, on > >>the > >> one hand, running a few tests targeting whatever you're doing (the bug > >> you're fixing or feature you're adding), plus a few quick "hello world" > >> tests (which gives you a feeling that your change is "probably good", > >>for > >> some definition of probably); and on the other hand, running the full > >> regression suite. I'm not sure it would replace either one of those > >> though. Thus, to be honest, I think the testing situation has bigger > >> problems at this point; there's a lot on that wiki page, and unit > >>testing > >> isn't even mentioned. > >> > >> As far as your points 2 & 3: The regression tests do print out 'FAILED' > >>vs. > >> 'CHANGED' or something like that, so you can tell the difference between > >> functional failures and stats changes pretty easily. You can look at > >>the > >> stats differences in the test output directory to see exactly what the > >> changes are. The job of figuring out whether a particular set of stats > >> changes is "reasonable" given some actual modeling change seems > >>inherently > >> impossible to automate, so I'm not sure what you're looking for there. > >> > >> Ali said he's been working on a new test framework; at this point I > >>expect > >> that's our best bet for moving forward. I'll let him decide whether he's > >> ready to say more about it. > >> > >> Steve > >> > >> On Sun, Nov 23, 2014 at 6:51 AM, Gabe Black via gem5-dev < > >> [email protected]> > >> wrote: > >> > >> > Hi everybody. I'd like to start a conversation about testing > >>strategies > >> and > >> > gem5. Please let me know if my understanding is out of date, but I > >>think > >> > the primary mechanism we use for testing is running benchmarks, > >>booting, > >> > etc., and making sure the stats haven't changed. There are a few > >>things > >> > that make that not so great. > >> > > >> > 1. Benchmarks can take a LONG time to run. I'd like to know whether my > >> > change is probably good in a couple seconds, not a couple hours. > >> > 2. There isn't much of an indication of *what* went wrong, just that > >> > something somewhere changed. > >> > 3. There isn't much of an indication *if* something went wrong. For a > >> > certain class of changes, it's reasonable to expect the stats to stay > >>the > >> > same. For instance, a simulator performance optimization shouldn't > >>change > >> > the stats. If you add a new device, change how execution works, fix > >>some > >> > microcode, etc., you just have to guestimate if the amount of change > >> looks > >> > reasonable and update the stats. Which, per 1, can take hours. > >> > 4. Merge conflicts. If two people make changes that affect the stats, > >>one > >> > will go in first, and the other person will have to rebase on top of > >> those > >> > changes and rerun the stats. Which, per 1, can take hours. > >> > > >> > I know writing new tests isn't what most people want to be doing with > >> their > >> > time (including me), but as far as I can see this is a big > >>shortcoming of > >> > the simulator as it stands. I think we would get a lot of benefit from > >> more > >> > unit tests of both base functionality (we have a little of that), and > >>of > >> > device models, execution bits, etc. (we have none?). We could either > >> expand > >> > on the unit test code we have, or bring in an existing framework like > >> this > >> > one: > >> > > >> > https://code.google.com/p/googletest/ > >> > > >> > I've never used that or know much of anything about it. > >> > > >> > It *should* be easy for us to use our modularity and object oriented > >> design > >> > to pull pieces of the simulator into test harnesses and make sure > >>they do > >> > reasonable things in isolation. If it isn't maybe that's something we > >> > should fix too. > >> > > >> > We should also think about how to make it easy/automatic to run unit > >> tests, > >> > and how to get them to run automatically alongside the nightly > >>regression > >> > runs. > >> > > >> > Gabe > >> > _______________________________________________ > >> > gem5-dev mailing list > >> > [email protected] > >> > http://m5sim.org/mailman/listinfo/gem5-dev > >> > > >> _______________________________________________ > >> gem5-dev mailing list > >> [email protected] > >> http://m5sim.org/mailman/listinfo/gem5-dev > >> > >_______________________________________________ > >gem5-dev mailing list > >[email protected] > >http://m5sim.org/mailman/listinfo/gem5-dev > > > > > -- IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2557590 > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, > Registered in England & Wales, Company No: 2548782 > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
