Hi Ali, I'm very interested to learn more about this new "memory checker" you've created. When do you expect to post your patch or can you explain a bit more of what it does? We've (AMD) have created a pretty significant relaxed memory model checker that is compatible with Ruby, but we have a lot of work to do before it is ready to shared externally. I'm curious to know whether you've created something similar.
Thanks, Brad -----Original Message----- From: gem5-dev [mailto:[email protected]] On Behalf Of Ali Saidi via gem5-dev Sent: Thursday, December 04, 2014 3:12 AM To: gem5 Developer List Subject: Re: [gem5-dev] testing Hi Gabe/Steve, As Steve mentioned I¹ve been working on a new take on the regression system in my spare-time. I want to get it a bit more complete before I show it to the world, but some of the goals I set out with align with yours. In particular I¹m targeting the following: (1) Success/Failure being more that stats or Œgem5 terminated because of ...¹. I have a range of criteria that can be specified by test from finding certain output on the terminal, in the stdout/stderr files, stats, etc. This code is easily extensible so other criteria could be easily added. (2) Adding a test should be easy. I think we should be testing our config files along with gem5 not a custom set of configs, so the regression tester takes command lines to gem5 that should be tested. (3) Tests should be able to be selected on many criteria. Right now we tests architectures and fs/se separately but everything else is batched together. It¹s annoyingly hard to specify requirements like, ³test o3 cpu on all architectures² on ³just run ruby tests.² (4) Allow simulation on a cluster/cloud if the resources are available to the user, but still work on a single machine. This at least provides more parallelism in running the regression tests. (5) Support for dependent tests. The notion that test A must be run and generate output before test B can be run is really powerful and lets you create tests that generate checkpoints and restore from them or only restore from them which would let us get to the interesting parts of tests much more quickly. (6) Much better reporting of tests, being able to output differences between different regression runs, etc. I view some of the points your brought up as orthogonal to this, although still very important. Much better unit testing would be great, although I think this is a separate kind of testing from what we traditionally do for regression tests. I spent a long time trying to bend various unit test frameworks into something that we could use to run regression tests and they all had issues in doing so. The question is how can we sensibly unit test some of these components without spending huge amounts of effort mocking up interfaces? Something that we¹ve done which has improved the situation a bit is NoISA tests for the memory system. This is not a unit-test but more of a directed-test. Isolating the memory system and coupling it with a memory checker we¹re about to post should give better indication of correctness in the memory system. A good next goal here is to hand create some of the tricker cases in the memory system as traces we can replay and hopefully verify that they work very quickly instead of relying on a few CPUs to hopefully generate that case over millions or billions of simulated instructions. Similarly, we should be able to do much better than the SPEC benchmarks we run today for regression tests. You¹re right that ideally we don¹t need to run the same loop a million times in a test. Any ideas where we could get something that was better? Preferably something that has self-checking code, freely distributable and cover many instructions. I wonder if LLVM or gcc has a repository of test code that might be useful? Finally, I agree that a CI system that could automatically run regressions when a patch was posted would be a huge improvement from where we are today. Thanks, Ali On 12/4/14, 7:56 AM, "Gabe Black via gem5-dev" <[email protected]> wrote: >What I'd like is for us not to use stats as a pass/fail criteria. I'm >not sure how that would work, but using the stats is pretty fragile and >hard to maintain. It's tricky because you want to make sure the stats >themselves are still correct, but there are lots of "correct" stats >which are different. I agree that automatically deciding how much stats >should change is not feasible. I haven't had a chance to read that wiki >page, but one thing I remember, perhaps from the last time this came >up, is that the regressions we run are benchmarks that do the same >thing many times to get steady state behavior. To verify something is >correct, we don't need to loop over the same block of code thousands or >millions of times. We could probably make things a lot faster without >losing coverage that way, although changing the regression binaries >wouldn't necessarily be very straightforward. > >While I think there are significant drawbacks from long running tests >as detailed in my earlier email, there are benefits for really quick >tests too. They could, for instance, be run automatically on every CL >as part of a continuous integration system. It would also be practical >to run all of them before sending a CL out. Right now I just take a >best guess what regressions are worth running since running the long >ones is a major time commitment, especially across all the ISAs. > >Gabe > >On Wed, Dec 3, 2014 at 9:58 PM, Steve Reinhardt via gem5-dev < >[email protected]> wrote: > >> Hi Gabe, >> >> There's a long history here; I think everyone agrees the status quo >>wrt testing is inadequate, but there are a lot of different needs as well. >>I >> won't go into a lot of detail, but there is a wiki page left over >>from our last attempt: http://gem5.org/NewRegressionFramework. >>Actually I see now that you contributed to an early version of that. >> >> I'm not opposed to us having more unit tests and a framework to run >>them. >> Having the ability to integrate unit tests into the regressions would >>be a good goal for a new regression system. >> >> Having better unit tests might provide a nice middle ground between, >>on the one hand, running a few tests targeting whatever you're doing >>(the bug you're fixing or feature you're adding), plus a few quick >>"hello world" >> tests (which gives you a feeling that your change is "probably good", >>for some definition of probably); and on the other hand, running the >>full regression suite. I'm not sure it would replace either one of >>those though. Thus, to be honest, I think the testing situation has >>bigger problems at this point; there's a lot on that wiki page, and >>unit testing isn't even mentioned. >> >> As far as your points 2 & 3: The regression tests do print out 'FAILED' >>vs. >> 'CHANGED' or something like that, so you can tell the difference >>between functional failures and stats changes pretty easily. You can >>look at the stats differences in the test output directory to see >>exactly what the changes are. The job of figuring out whether a >>particular set of stats changes is "reasonable" given some actual >>modeling change seems inherently impossible to automate, so I'm not >>sure what you're looking for there. >> >> Ali said he's been working on a new test framework; at this point I >>expect that's our best bet for moving forward. I'll let him decide >>whether he's ready to say more about it. >> >> Steve >> >> On Sun, Nov 23, 2014 at 6:51 AM, Gabe Black via gem5-dev < >> [email protected]> >> wrote: >> >> > Hi everybody. I'd like to start a conversation about testing >>strategies >> and >> > gem5. Please let me know if my understanding is out of date, but I >>think >> > the primary mechanism we use for testing is running benchmarks, >>booting, >> > etc., and making sure the stats haven't changed. There are a few >>things >> > that make that not so great. >> > >> > 1. Benchmarks can take a LONG time to run. I'd like to know whether >> > my change is probably good in a couple seconds, not a couple hours. >> > 2. There isn't much of an indication of *what* went wrong, just >> > that something somewhere changed. >> > 3. There isn't much of an indication *if* something went wrong. For >> > a certain class of changes, it's reasonable to expect the stats to >> > stay >>the >> > same. For instance, a simulator performance optimization shouldn't >>change >> > the stats. If you add a new device, change how execution works, fix >>some >> > microcode, etc., you just have to guestimate if the amount of >> > change >> looks >> > reasonable and update the stats. Which, per 1, can take hours. >> > 4. Merge conflicts. If two people make changes that affect the >> > stats, >>one >> > will go in first, and the other person will have to rebase on top >> > of >> those >> > changes and rerun the stats. Which, per 1, can take hours. >> > >> > I know writing new tests isn't what most people want to be doing >> > with >> their >> > time (including me), but as far as I can see this is a big >>shortcoming of >> > the simulator as it stands. I think we would get a lot of benefit >> > from >> more >> > unit tests of both base functionality (we have a little of that), >> > and >>of >> > device models, execution bits, etc. (we have none?). We could >> > either >> expand >> > on the unit test code we have, or bring in an existing framework >> > like >> this >> > one: >> > >> > https://code.google.com/p/googletest/ >> > >> > I've never used that or know much of anything about it. >> > >> > It *should* be easy for us to use our modularity and object >> > oriented >> design >> > to pull pieces of the simulator into test harnesses and make sure >>they do >> > reasonable things in isolation. If it isn't maybe that's something >> > we should fix too. >> > >> > We should also think about how to make it easy/automatic to run >> > unit >> tests, >> > and how to get them to run automatically alongside the nightly >>regression >> > runs. >> > >> > Gabe >> > _______________________________________________ >> > gem5-dev mailing list >> > [email protected] >> > http://m5sim.org/mailman/listinfo/gem5-dev >> > >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> >_______________________________________________ >gem5-dev mailing list >[email protected] >http://m5sim.org/mailman/listinfo/gem5-dev > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
