Re: [gem5-dev] testing

Beckmann, Brad via gem5-dev Thu, 04 Dec 2014 10:40:47 -0800

Hi Ali,

I'm very interested to learn more about this new "memory checker" you've 
created.  When do you expect to post your patch or can you explain a bit more 
of what it does?  We've (AMD) have created a pretty significant relaxed memory 
model checker that is compatible with Ruby, but we have a lot of work to do 
before it is ready to shared externally.  I'm curious to know whether you've 
created something similar.

Thanks,

Brad

-----Original Message-----
From: gem5-dev [mailto:[email protected]] On Behalf Of Ali Saidi via 
gem5-dev
Sent: Thursday, December 04, 2014 3:12 AM
To: gem5 Developer List
Subject: Re: [gem5-dev] testing

Hi Gabe/Steve,

As Steve mentioned I¹ve been working on a new take on the regression system in 
my spare-time.

I want to get it a bit more complete before I show it to the world, but some of 
the goals I set out with align with yours.

In particular I¹m targeting the following:
(1) Success/Failure being more that stats or Œgem5 terminated because of ...¹. 
I have a range of criteria that can be specified by test from finding certain 
output on the terminal, in the stdout/stderr files, stats, etc. This code is 
easily extensible so other criteria could be easily added.
(2) Adding a test should be easy. I think we should be testing our config files 
along with gem5 not a custom set of configs, so the regression tester takes 
command lines to gem5 that should be tested.
(3) Tests should be able to be selected on many criteria. Right now we tests 
architectures and fs/se separately but everything else is batched together. 
It¹s annoyingly hard to specify requirements like, ³test o3 cpu on all 
architectures² on ³just run ruby tests.²
(4) Allow simulation on a cluster/cloud if the resources are available to the 
user, but still work on a single machine. This at least provides more 
parallelism in running the regression tests.
(5) Support for dependent tests. The notion that test A must be run and 
generate output before test B can be run is really powerful and lets you create 
tests that generate checkpoints and restore from them or only restore from them 
which would let us get to the interesting parts of tests much more quickly.
(6) Much better reporting of tests, being able to output differences between 
different regression runs, etc.

I view some of the points your brought up as orthogonal to this, although still 
very important.

Much better unit testing would be great, although I think this is a separate 
kind of testing from what we traditionally do for regression tests. I spent a 
long time trying to bend various unit test frameworks into something that we 
could use to run regression tests and they all had issues in doing so. The 
question is how can we sensibly unit test some of these components without 
spending huge amounts of effort mocking up interfaces? Something that we¹ve 
done which has improved the situation a bit is NoISA tests for the memory 
system. This is not a unit-test but more of a directed-test. Isolating the 
memory system and coupling it with a memory checker we¹re about to post should 
give better indication of correctness in the memory system. A good next goal 
here is to hand create some of the tricker cases in the memory system as traces 
we can replay and hopefully verify that they work very quickly instead of 
relying on a few CPUs to hopefully generate that case over millions or billions 
of simulated instructions.

Similarly, we should be able to do much better than the SPEC benchmarks we run 
today for regression tests. You¹re right that ideally we don¹t need to run the 
same loop a million times in a test. Any ideas where we could get something 
that was better?  Preferably something that has self-checking code, freely 
distributable and cover many instructions. I wonder if LLVM or gcc has a 
repository of test code that might be useful?

Finally, I agree that a CI system that could automatically run regressions when 
a patch was posted would be a huge improvement from where we are today.

Thanks,

Ali

On 12/4/14, 7:56 AM, "Gabe Black via gem5-dev" <[email protected]> wrote:

>What I'd like is for us not to use stats as a pass/fail criteria. I'm 
>not sure how that would work, but using the stats is pretty fragile and 
>hard to maintain. It's tricky because you want to make sure the stats 
>themselves are still correct, but there are lots of "correct" stats 
>which are different. I agree that automatically deciding how much stats 
>should change is not feasible. I haven't had a chance to read that wiki 
>page, but one thing I remember, perhaps from the last time this came 
>up, is that the regressions we run are benchmarks that do the same 
>thing many times to get steady state behavior. To verify something is 
>correct, we don't need to loop over the same block of code thousands or 
>millions of times. We could probably make things a lot faster without 
>losing coverage that way, although changing the regression binaries 
>wouldn't necessarily be very straightforward.
>
>While I think there are significant drawbacks from long running tests 
>as detailed in my earlier email, there are benefits for really quick 
>tests too. They could, for instance, be run automatically on every CL 
>as part of a continuous integration system. It would also be practical 
>to run all of them before sending a CL out. Right now I just take a 
>best guess what regressions are worth running since running the long 
>ones is a major time commitment, especially across all the ISAs.
>
>Gabe
>
>On Wed, Dec 3, 2014 at 9:58 PM, Steve Reinhardt via gem5-dev < 
>[email protected]> wrote:
>
>> Hi Gabe,
>>
>> There's a long history here; I think everyone agrees the status quo 
>>wrt  testing is inadequate, but there are a lot of different needs as well.
>>I
>> won't go into a lot of detail, but there is a wiki page left over 
>>from our  last attempt: http://gem5.org/NewRegressionFramework.  
>>Actually I see now  that you contributed to an early version of that.
>>
>> I'm not opposed to us having more unit tests and a framework to run 
>>them.
>> Having the ability to integrate unit tests into the regressions would 
>>be a  good goal for a new regression system.
>>
>> Having better unit tests might provide a nice middle ground between, 
>>on the  one hand, running a few tests targeting whatever you're doing 
>>(the bug  you're fixing or feature you're adding), plus a few quick 
>>"hello world"
>> tests (which gives you a feeling that your change is "probably good", 
>>for  some definition of probably); and on the other hand, running the 
>>full  regression suite.  I'm not sure it would replace either one of 
>>those  though. Thus, to be honest, I think the testing situation has 
>>bigger  problems at this point; there's a lot on that wiki page, and 
>>unit testing  isn't even mentioned.
>>
>> As far as your points 2 & 3: The regression tests do print out 'FAILED'
>>vs.
>> 'CHANGED' or something like that, so you can tell the difference 
>>between  functional failures and stats changes pretty easily.  You can 
>>look at the  stats differences in the test output directory to see 
>>exactly what the  changes are.  The job of figuring out whether a 
>>particular set of stats  changes is "reasonable" given some actual 
>>modeling change seems inherently  impossible to automate, so I'm not 
>>sure what you're looking for there.
>>
>> Ali said he's been working on a new test framework; at this point I 
>>expect  that's our best bet for moving forward. I'll let him decide 
>>whether he's  ready to say more about it.
>>
>> Steve
>>
>> On Sun, Nov 23, 2014 at 6:51 AM, Gabe Black via gem5-dev < 
>> [email protected]>
>> wrote:
>>
>> > Hi everybody. I'd like to start a conversation about testing
>>strategies
>> and
>> > gem5. Please let me know if my understanding is out of date, but I
>>think
>> > the primary mechanism we use for testing is running benchmarks,
>>booting,
>> > etc., and making sure the stats haven't changed. There are a few
>>things
>> > that make that not so great.
>> >
>> > 1. Benchmarks can take a LONG time to run. I'd like to know whether 
>> > my change is probably good in a couple seconds, not a couple hours.
>> > 2. There isn't much of an indication of *what* went wrong, just 
>> > that something somewhere changed.
>> > 3. There isn't much of an indication *if* something went wrong. For 
>> > a certain class of changes, it's reasonable to expect the stats to 
>> > stay
>>the
>> > same. For instance, a simulator performance optimization shouldn't
>>change
>> > the stats. If you add a new device, change how execution works, fix
>>some
>> > microcode, etc., you just have to guestimate if the amount of 
>> > change
>> looks
>> > reasonable and update the stats. Which, per 1, can take hours.
>> > 4. Merge conflicts. If two people make changes that affect the 
>> > stats,
>>one
>> > will go in first, and the other person will have to rebase on top 
>> > of
>> those
>> > changes and rerun the stats. Which, per 1, can take hours.
>> >
>> > I know writing new tests isn't what most people want to be doing 
>> > with
>> their
>> > time (including me), but as far as I can see this is a big
>>shortcoming of
>> > the simulator as it stands. I think we would get a lot of benefit 
>> > from
>> more
>> > unit tests of both base functionality (we have a little of that), 
>> > and
>>of
>> > device models, execution bits, etc. (we have none?). We could 
>> > either
>> expand
>> > on the unit test code we have, or bring in an existing framework 
>> > like
>> this
>> > one:
>> >
>> > https://code.google.com/p/googletest/
>> >
>> > I've never used that or know much of anything about it.
>> >
>> > It *should* be easy for us to use our modularity and object 
>> > oriented
>> design
>> > to pull pieces of the simulator into test harnesses and make sure
>>they do
>> > reasonable things in isolation. If it isn't maybe that's something 
>> > we should fix too.
>> >
>> > We should also think about how to make it easy/automatic to run 
>> > unit
>> tests,
>> > and how to get them to run automatically alongside the nightly
>>regression
>> > runs.
>> >
>> > Gabe
>> > _______________________________________________
>> > gem5-dev mailing list
>> > [email protected]
>> > http://m5sim.org/mailman/listinfo/gem5-dev
>> >
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>_______________________________________________
>gem5-dev mailing list
>[email protected]
>http://m5sim.org/mailman/listinfo/gem5-dev
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No:  2557590 ARM Holdings plc, Registered office 
110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company 
No:  2548782

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] testing

Reply via email to