++ to this likely just being an issue of reading the wrong stat. I've
personally diffed every instruction on a small run of libquantum (though on
ARM).
You can always implement a "poor man's checker" to execute two gem5 cpu
models in lock-step, verifying the committed instruction path (assuming
single-threaded SE). This is how I brought up and verified my own cpu
model in gem5.
Step 1) Add a DPRINTF (with a new trace flag) at commit that simply prints
out the instruction being committed on the various CPU models you wish to
compare.
DPRINTF(CommitInst, "Committing instruction [pc:%#x
(%d)].%s\n",
pcState.instAddr(),
pcState.microPC(),
curStaticInst->disassemble(pcState.instAddr()).c_str());
Step 2) Create 4 fifos in the gem5 output directory (m5out)
>> cd m5out
>> mkfifo cpu1_fifo1
>> mkfifo cpu1_fifo2
>> mkfifo cpu2_fifo1
>> mkfifo cpu2_fifo2
Step 3) Write a small perl/python script that just strips off the cycle
time info from the output
#!/usr/bin/perl -w
while(<>) {
if (/.*(Committing.*)/) {
print "$1\n"
}
}
Step 4) Run two instances of gem5 (with different CPUs) outputting to
their respective "fifo1"
./build/ARM/gem5.opt --debug-flag=CommitInst
--trace-file=cpu1_fifo1 configs/example/se.py --cpu-type=timing
--cmd=462.libquantum -o "15 2"
./build/ARM/gem5.opt --debug-flag=CommitInst
--trace-file=cpu2_fifo1 configs/example/se.py --cpu-type=detailed
--cmd=462.libquantum -o "15 2"
Step 5) Filter both of these output fifos through the script from step 3
into the second fifo
./parse.pl m5out/cpu1_fifo1 > m5out/cpu1_fifo2
./parse.pl m5out/cpu2_fifo1 > m5out/cpu2_fifo2
Step 6) Diff/compare the two fifos. Once you execute this command, both
instances of gem5 will run in lock-step. If any instruction committed is
different the program will stop, showing the line number (equivalent to the
committed micro-op) where execution differed.
cmp m5out/cpu1_fifo2 m5out/cpu2_fifo2
This is an "easy" way to verify two gem5 cpu models are committing the
exact same sequence of instructions without generating huge trace files.
On Thu, Apr 25, 2013 at 10:29 PM, Ashish Venkat <[email protected]>wrote:
> Hi Zeb,
>
> If you want to compare the number of instructions committed in O3 vs
> atomic, compare the following 2 stats:
> O3: cpu.commit.commitCommittedInsts
> Atomic: cpu.num_insts (which goes into sim_insts)
> These two should generally be equal.
>
> The cpu.committedInsts (which goes into sim_insts) in O3 gives you the
> number of committed instructions excluding NOPs and prefetch
> instructions. I think that explains the difference.
>
> -Ashish
>
> On Wed, Mar 13, 2013 at 8:30 AM, Zebulun Barnett <[email protected]>
> wrote:
> > tl;dr Detailed cpu borked? --prog-interval borked? definitely :(
> >
> > Greetings,
> >
> > My research group has recently started using Gem5 (coming from m5) and we
> > have noticed an anomaly with the Alpha SE Detailed(O3) CPU model. Of the
> 4
> > types of CPU available (Atomic, Detailed(O3), Timing, InOrder) all but
> the
> > Detailed model take the exact same number of instructions to complete a
> > benchmark. The Detailed CPU consistently requires less instructions
> (about
> > %10 less) to complete a given benchmark. Multiple benchmarks have
> indicated
> > the same result. We are aware that the main difference between the
> Detailed
> > CPU and the others is that it is an out-of-order processor. Is it
> possible
> > this is the cause of the difference? Is it simply handling the
> instructions
> > more efficiently?
> >
> > During our testing, we attempted to use fast forwarding to convince
> > ourselves that the different CPU types actually did commit a different
> > number of instructions. In, libquantum, one of the benchmarks in which we
> > noticed this behavior, the atomic cpu commits ~289 million instructions
> > while the detailed cpu commits ~269 million instructions. With fast
> > forwarding (using the atomic cpu and switching to the detailed model at
> 200
> > million instructions) the total number of instructions committed is ~282.
> > This number convinces us that the detailed model indeed commits a
> different
> > amount of instructions than the other types.
> >
> > Also, during the fast forward test, we set --prog-interval to 1,000,000
> > instructions. The interval behaved normally up to the switch, but once
> the
> > detailed CPU took over, it started reporting the same value every time.
> Each
> > printout after the switch was stuck at 2,000,001 instructions and the
> > committed instruction value was 0. However, the simulation completed as
> if
> > it committed all instructions successfully. We will submit a bug report
> for
> > this specific issue, but if anyone else has experienced this, please let
> us
> > know.
> >
> > If these are well-known or obvious issues, I apologize in advance for
> > wasting your time. Let it be known that I did search the archives to no
> > avail.
> >
> > Any insight would be most appreciated.
> >
> > Thank you,
> > Zeb Barnett
> >
> > -----
> > Student Research Assistant
> > High Performance Computer Lab
> > Lamar University
> >
> >
> >
> >
> >
> > CONFIDENTIALITY: Any information contained in this e-mail (including
> > attachments) is the property of The State of Texas and unauthorized
> > disclosure or use is prohibited. Sending, receiving or forwarding of
> > confidential, proprietary and privileged information is prohibited under
> > Lamar Policy. If you received this e-mail in error, please notify the
> sender
> > and delete this e-mail from your system.
> >
> > _______________________________________________
> > gem5-users mailing list
> > [email protected]
> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users