I had to remove the files because of the space limitation!
-------- Original Message --------
Subject: Re: [gem5-users] problem while running the same binary file in
different frequencies for ARM platform
Date: Fri, 22 Feb 2013 21:24:00 +0000
From: Negar Miralaei <[email protected]>
To: Erik Tomusk <[email protected]>
CC: [email protected]
Hi Erik,
Thank you again for your response.
I think I should mention some points. First of all, I did run 4
benchmarks on 4 cores with 1bn instructions! After I got the strange
results I tried one benchmark with 1bn/100m/10m/... instructions with
cache and without cache. At last, I started debugging the code to find
the problem so I used just 10 instructions to speed up the test process.
The graph I attached in the previous mail is at least for 10m
instructions for one benchmark (ammp).
About the sim_seconds, here is one of the numbers I've got from the
stats.txt file which is for 100m instructions. As you can see, the
sim_seconds and sim_ticks are similar (some how!) and this is true for
every run and every benchmark. So, when I calculated CPI or IPC it
wasn't acceptable for me.
sim_seconds 1.259019 # Number of seconds simulated
sim_ticks 1259019331000 # Number of ticks simulated
I attached 5 more files just in case you had time to take a brief look
at them. 'stats-100m-g3' and 'stats-1m' are both stats files for a set
of 4 benchmarks running on 4 cpu cores with l2 cache. The only
difference is the number of instructions. The three other stats are for
one benchmark running on one cpu core with 3 different frequencies. Now,
if these outputs are correct and meaningful, I'd really appreciate it if
you tell me what should I calculate as an accurate comparison parameter
for different frequencies.
Cheers
Negar
On 2/22/2013 7:16 PM, Erik Tomusk wrote:
Hi Negar,
A sim_tick has nothing to do with seconds or clock cycles. The best
explanation for your graphs is that you are looking at noise.
If you want to measure speed, you can plot sim_seconds against
frequency. You will need to run for much longer than 10 instructions
for there to be a noticeable difference.
To calculate CPI, divide numCycles/committedInsts. If you only run for
10 instructions, your caches will be cold, and you won't see any
meaningful data. Again, try running for 10M-100M instructions, do that
for several frequencies, and see if the results make sense.
-Erik
On 21/02/13 23:36, Negar Miralaei wrote:
Hi Erik,
Thank you very much for your description. Now, I can understand all
these numbers. But, I still cannot accept that when I run one
application with different frequencies I get the same speed. I had
the same results when I ran a set of 4 benchmarks on 4 cores in a
system. I thought when I run something faster I would have given the
smaller simulation time, even for the same application. Is it because
I'm forcing the simulator to run the specific amount of instructions?
The faster cpu should finish the execution of the same number of
instruction earlier, shouldn't it? What is the best factor to see the
application's behaviour under different frequencies? (CPI [In gem5:
Cycle per Instruction, or Clock per instruction?])
Consider to your description, how can I explain the trend of
simulation time I got from different frequencies, again for one
application? (I attached the trend if you want to have a look)
Sorry if I asked lots of questions, and thank you again for your help
and attention.
Cheers
Negar
On 2/21/2013 6:33 PM, Erik Tomusk wrote:
Hi Negar,
clock and cycles are what you would expect to see in hardware: If
your clock (period) is set to 1000 (picoseconds), then you're
running at 1GHz, and will see 1 billion cycles every second. A tick
is an artificial simulator construct that represents the smallest
unit of time that the simulator can model. It doesn't correspond to
anything in the real world (except maybe Planck's time...).
So in your examples, it took 270 and 432 CPU cycles respectively to
execute the 10 instructions. This is because if you have a faster
clock, some actions take more clock cycles. E.g. if you need to wait
100ns, that's 50 cycles for a 500MHz clock, but 80 cycles for an
800MHz clock.
In other words, if the workload you are running is limited by very
slow memory operations, then you would expect the 800MHz version to
require (8/5) as many cycles as the 500MHz version. In this case,
270*8/5=432, so the numbers are spot on.
All that sim_ticks is telling you is how many ticks gem5 used under
the hood to complete the 10 instructions. Given that you ran exactly
the same workload both times, it's reasonable for the two numbers to
be similar. Like you say, there could be some extra ticks if the
clock period isn't a whole number, but this isn't a problem because
sim_ticks is just something gem5 uses internally to keep track of
when things happen. You wouldn't use it in CPI calculations, etc.
Hope that helps
-Erik
On 21/02/13 16:47, Negar Miralaei wrote:
Dear Gem5 Authors,
I'm running some single-core benchmarks (cpu2000) on gem5 with the
ARM platform. I've got some strange results while changing the
frequency of the processor. I tried debugging the gem5 code and now
I'm confusing with some parameter settings about the clock, tick
and cycles. I will be so thankful if you could clarify the
following results.
First of all, this is the command I used (I reduced the number of
instructions due to test):
build/ARM/gem5.opt --stats-file=new-test/stats-gap-big.txt
--trace-file=new-test/trace-gap-big.out --debug-flag=All -d
system/disks/CPU2000/output/ --remote-gdb-port=0
system/disks/CPU2000/configs/mp-se-changed.py --fastfwd-insts 10000
--max-insts 10 --bench gap &>
system/disks/CPU2000/output/new-test/jobout-gap-big.txt
I run one benchmark at 2 different frequencies and I've got these
results in the stats.txt file:
STATS 500MHz 800MHz
sim_seconds 0.000001 0.000001
sim_ticks 541000 541000
final_tick 14490000 14490000
sim_freq 1000000000000 1000000000000
host_inst_rate 4982188 5011618
host_op_rate 6197423 6233214
host_tick_rate 265854982 267367654
host_mem_usage 625692 625692
sim_insts 10012 10012
sim_ops 12535 12535
system.mainCpu.numCycles 270 432
system.mainCpu.committedInsts 11 11
system.mainCpu.committedOps 11 11
system.mainCpu.num_int_alu_accesses 10
10
I'm a bit confused about the relationship between the number of
cycles and the number of ticks and clock. I thought the number of
cycles should be the same and the ticks be different! I cannot
understand the calculation in clocked_object.hh file in the
update() function, the number of cycles change based on the clock
and the number of ticks? I thought each operation has a fix amount
of cycles which is independent from the time.
Another problem with the time and events is when I looked at the
timing.cc code in IcachePort::recvTimingResp(PacketPtr pkt) and
DcachePort::recvTimingResp(PacketPtr pkt), I saw that you checked
that the next_tick is equal to the curTick, and if not you schedule
the event for the next tick. This 'if' is always true for the
500MHz and 800MHz because the multiplication of clock and cycle
equals to the tick but it's always 'else' for some other
frequencies like 550MHz so it does one or two extra events with the
description of 'Timing CPU icache tick' and 'Timing CPU dcache
tick' which makes 550MHZ even slower than 500MHz! I'd be so
thankful if you could clarify these differences and similarities.
Thanks
Negar
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users