----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2120/#review4833 -----------------------------------------------------------
src/sim/clock_domain.hh <http://reviews.gem5.org/r/2120/#comment4520> Spacing around the ( and ), ...Domain(ClockedObject *c) src/sim/clock_domain.hh <http://reviews.gem5.org/r/2120/#comment4521> It would be good to add asserts to check that c != NULL, and c is not in member already. src/sim/clock_domain.cc <http://reviews.gem5.org/r/2120/#comment4524> I'd suggest to stick with auto m = members.begin()... src/sim/clock_domain.cc <http://reviews.gem5.org/r/2120/#comment4522> I'd prefer to add a public method to the clockedObject, call this updateClockPeriod, just like with the domains The updateclockPeriod could just call the private update. src/sim/clocked_object.hh <http://reviews.gem5.org/r/2120/#comment4525> Spacing around (this) You beat us to it :-) Thanks for providing this functionality. I would perhaps shorten the description a bit and highlight that the patch provides support for DFS (it was simply not intended to do this before...so it was not really a bug). - Andreas Hansson On Dec. 16, 2013, 3:28 p.m., Christopher Torng wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2120/ > ----------------------------------------------------------- > > (Updated Dec. 16, 2013, 3:28 p.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > sim: Fix a bug when scaling a clock domain's frequency > > A SrcClockDomain (src/sim/clock_domain.hh) can change its clock period with > the clockPeriod(Tick) function. However, when a SrcClockDomain changes the > clock period, the ClockedObjects in the clock domain may be inactive. When > these ClockedObjects wake up, they update their internal tick incorrectly, > leading to incorrect latency calculations throughout the system. For > example, a 2 cycle L1 cache hit latency can turn into 3 cycles, causing > significant slowdowns in simulated runtime after the frequency change (4% to > 20% slowdowns in my experiments). > > This patch fixes the bug by adding a member list to the ClockDomain -- > ClockedObjects register themselves with their clock domain at construction > time and are added to the member list. Using this list, before a clock > domain modifies its clock period, it can update each member's tick to the > curTick(). > > Diffed from Changeset 9993 > > > Diffs > ----- > > src/sim/clocked_object.hh bdd606534bdc > src/sim/clock_domain.cc bdd606534bdc > src/sim/clock_domain.hh bdd606534bdc > > Diff: http://reviews.gem5.org/r/2120/diff/ > > > Testing > ------- > > Purely to demonstrate this bug, I've added the following code, using the > inorder cpu. > > In src/sim/stat_control.cc (changed precision from 6 to 12 for more > detail):: > > simSeconds > .name("sim_seconds") > .desc("Number of seconds simulated") > .precision(12) > ; > > In src/sim/clocked_object.hh (added function for a clocked object to get > its clock domain's pointer):: > > inline ClockDomain* getClockDomain() const > { > ClockDomain *c = &clockDomain; > return c; > } > > In src/cpu/inorder/InOrderCPU.py (change stageWidth to 1 to make dumps > easier to read):: > > stageWidth = Param.Unsigned(1, "Stage width") > > In src/cpu/inorder/cpu.hh (create a pointer to a SrcClockDomain to control > the CPU's clock period from within the CPU):: > > SrcClockDomain *clockDomain; > > In src/cpu/inorder/cpu.cc (let CPU constructor set the clock domain > pointer):: > > InOrderCPU::InOrderCPU(Params *params) > : (...) > { > (...) > clockDomain = (SrcClockDomain*) getClockDomain(); > } > > In src/cpu/inorder/cpu.cc (after 5000 cycles, change the frequency to > 1999ps):: > > void > InOrderCPU::tick() > { > > (...) > > if (numCycles.value() > 5000) > { > clockDomain->clockPeriod(1999); > } > } > > After compiling these testing changes, I run MIPS hello world at 500 MHz > (2000 ps):: > > $ ./build/MIPS/gem5.opt configs/example/se.py -c > tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches > --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 > --l1i_assoc=1 --cacheline_size=32 --num-cpus=1 > > Results: > > - Without patch, simSeconds is 0.000040804594. > - With patch, simSeconds is 0.000038657668. > > The patch speeds up simulated runtime by 5.6% here. You can check the Exec > debug-flag dump to see that memory instructions complete with different > latencies with and without the patch. > > Instructions in middle of simulation, without patch, the time difference is > 5997 ps (3 > cycles):: > > 10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 > : IntAlu : D=0x0000000000000000 > 10079964: system.cpu T0 : @_int_malloc+996 : lw r25, > -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac > > Same instructions with patch, the time difference is 3998 ps (2 cycles):: > > 10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 > : IntAlu : D=0x0000000000000000 > 10075966: system.cpu T0 : @_int_malloc+996 : lw r25, > -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac > > The correct L1 hit latency is 2 cycles. > > For more information, you can look at src/sim/clocked_object.hh update() to > see how the tick is incorrectly updated after a clocked object wakes up. > > Here's the relevant section of code in src/sim/clocked_object.hh update():: > > void update() const > { > (...) > > // if not, we have to recalculate the cycle and tick, we > // perform the calculations in terms of relative cycles to > // allow changes to the clock period in the future > Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod())); > cycle += elapsedCycles; > tick += elapsedCycles * clockPeriod(); > } > > This part of update() is only executed when a clocked object has been > inactive and has not updated its tick for a while. The code updates the tick > using clockPeriod() -- basically it's assuming that the clock period has not > changed since it went to sleep. > > In the hello world example, the dcache goes out of sync with the system. You > can see this by adding this printout. > > In src/sim/clocked_object.hh:: > > inline Tick clockEdge(Cycles cycles = Cycles(0)) const > { > (update ...) > > std::cout << name() << " tick is "; > std::cout << std::dec << tick << std::endl; > > (return ...) > } > > If you compile and re-simulate and go to those same two instructions we > looked at earlier... > > Without patch:: > > 10073967: system.cpu T0 : @_int_malloc+992 : srl r9, r21 > : IntAlu : D=0x0000000000000000 > system.cpu.dcache tick is 10075945 > system.cpu.dcache tick is 10075945 > system.cpu.dcache tick is 10075945 > system.cpu tick is 10073967 > system.cpu tick is 10075966 > system.cpu tick is 10077965 > 10079964: system.cpu T0 : @_int_malloc+996 : lw r25, > -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac > > The CPU's lw graduates at tick 10079964. Meanwhile dcache has tick 10075945. > This is a difference of 4019, which is not a multiple of the 1999 ps clock > period. This mismatch causes the L1 hit latency to increase from 2 cycles to > 3 cycles. > > With patch:: > > 10071968: system.cpu T0 : @_int_malloc+992 : srl r9, r21 > : IntAlu : D=0x0000000000000000 > system.cpu.dcache tick is 10071968 > system.cpu.dcache tick is 10071968 > system.cpu.dcache tick is 10071968 > system.cputick is 10071968 > system.cputick is 10073967 > 10075966: system.cpu T0 : @_int_malloc+996 : lw r25, > -32740(r28) : MemRead : D=0x0000000010000000 A=0x10004fac > > The CPU's lw graduates at tick 10075966. Meanwhile dcache has tick 10071968. > This is a difference of 3998, which is exactly 2 cycles. Since the cache and > CPU are together, the L1 hit latency stays at 2 cycles even after frequency > scaling. > > > Thanks, > > Christopher Torng > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
