-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2120/#review4833
-----------------------------------------------------------



src/sim/clock_domain.hh
<http://reviews.gem5.org/r/2120/#comment4520>

    Spacing around the ( and ),
    
    ...Domain(ClockedObject *c)
    
    



src/sim/clock_domain.hh
<http://reviews.gem5.org/r/2120/#comment4521>

    It would be good to add asserts to check that c != NULL, and c is not in 
member already.



src/sim/clock_domain.cc
<http://reviews.gem5.org/r/2120/#comment4524>

    I'd suggest to stick with auto m = members.begin()...



src/sim/clock_domain.cc
<http://reviews.gem5.org/r/2120/#comment4522>

    I'd prefer to add a public method to the clockedObject, call this 
updateClockPeriod, just like with the domains
    
    The updateclockPeriod could just call the private update.



src/sim/clocked_object.hh
<http://reviews.gem5.org/r/2120/#comment4525>

    Spacing around (this)


You beat us to it :-)

Thanks for providing this functionality. I would perhaps shorten the 
description a bit and highlight that the patch provides support for DFS (it was 
simply not intended to do this before...so it was not really a bug).

- Andreas Hansson


On Dec. 16, 2013, 3:28 p.m., Christopher Torng wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2120/
> -----------------------------------------------------------
> 
> (Updated Dec. 16, 2013, 3:28 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> sim: Fix a bug when scaling a clock domain's frequency
> 
> A SrcClockDomain (src/sim/clock_domain.hh) can change its clock period with
> the clockPeriod(Tick) function. However, when a SrcClockDomain changes the
> clock period, the ClockedObjects in the clock domain may be inactive. When
> these ClockedObjects wake up, they update their internal tick incorrectly,
> leading to incorrect latency calculations throughout the system. For
> example, a 2 cycle L1 cache hit latency can turn into 3 cycles, causing
> significant slowdowns in simulated runtime after the frequency change (4% to
> 20% slowdowns in my experiments).
> 
> This patch fixes the bug by adding a member list to the ClockDomain --
> ClockedObjects register themselves with their clock domain at construction
> time and are added to the member list. Using this list, before a clock
> domain modifies its clock period, it can update each member's tick to the
> curTick().
> 
> Diffed from Changeset 9993
> 
> 
> Diffs
> -----
> 
>   src/sim/clocked_object.hh bdd606534bdc 
>   src/sim/clock_domain.cc bdd606534bdc 
>   src/sim/clock_domain.hh bdd606534bdc 
> 
> Diff: http://reviews.gem5.org/r/2120/diff/
> 
> 
> Testing
> -------
> 
>   Purely to demonstrate this bug, I've added the following code, using the
>   inorder cpu.
> 
>   In src/sim/stat_control.cc (changed precision from 6 to 12 for more 
> detail)::
> 
>     simSeconds
>         .name("sim_seconds")
>         .desc("Number of seconds simulated")
>         .precision(12)
>         ;
> 
>   In src/sim/clocked_object.hh (added function for a clocked object to get 
> its clock domain's pointer)::
> 
>     inline ClockDomain* getClockDomain() const
>     {
>         ClockDomain *c = &clockDomain;
>         return c;
>     }
> 
>   In src/cpu/inorder/InOrderCPU.py (change stageWidth to 1 to make dumps 
> easier to read)::
> 
>     stageWidth = Param.Unsigned(1, "Stage width")
> 
>   In src/cpu/inorder/cpu.hh (create a pointer to a SrcClockDomain to control 
> the CPU's clock period from within the CPU)::
> 
>     SrcClockDomain *clockDomain;
> 
>   In src/cpu/inorder/cpu.cc (let CPU constructor set the clock domain 
> pointer)::
> 
>     InOrderCPU::InOrderCPU(Params *params)
>         : (...)
>     {
>         (...)
>         clockDomain = (SrcClockDomain*) getClockDomain();
>     }
> 
>   In src/cpu/inorder/cpu.cc (after 5000 cycles, change the frequency to 
> 1999ps)::
> 
>   void
>   InOrderCPU::tick()
>   {
> 
>     (...)
> 
>     if (numCycles.value() > 5000)
>     {
>       clockDomain->clockPeriod(1999);
>     }
>   }
> 
>   After compiling these testing changes, I run MIPS hello world at 500 MHz
>   (2000 ps)::
> 
>     $ ./build/MIPS/gem5.opt configs/example/se.py -c 
> tests/test-progs/hello/bin/mips/linux/hello --cpu-type=inorder --caches 
> --cpu-clock=500MHz --l1d_size=16kB --l1i_size=16kB --l1d_assoc=1 
> --l1i_assoc=1 --cacheline_size=32 --num-cpus=1
> 
>   Results:
> 
>   - Without patch, simSeconds is 0.000040804594.
>   - With    patch, simSeconds is 0.000038657668.
> 
>   The patch speeds up simulated runtime by 5.6% here. You can check the Exec
>   debug-flag dump to see that memory instructions complete with different
>   latencies with and without the patch.
> 
>   Instructions in middle of simulation, without patch, the time difference is 
> 5997 ps (3
>   cycles)::
> 
>     10073967: system.cpu T0 : @_int_malloc+992    : srl        r9, r21        
>  : IntAlu :  D=0x0000000000000000
>     10079964: system.cpu T0 : @_int_malloc+996    : lw         r25, 
> -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac
> 
>   Same instructions with patch, the time difference is 3998 ps (2 cycles)::
> 
>     10071968: system.cpu T0 : @_int_malloc+992    : srl        r9, r21        
>  : IntAlu :  D=0x0000000000000000
>     10075966: system.cpu T0 : @_int_malloc+996    : lw         r25, 
> -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac
> 
>   The correct L1 hit latency is 2 cycles.
> 
>   For more information, you can look at src/sim/clocked_object.hh update() to
>   see how the tick is incorrectly updated after a clocked object wakes up.
> 
>   Here's the relevant section of code in src/sim/clocked_object.hh update()::
> 
>     void update() const
>     {
>         (...)
> 
>         // if not, we have to recalculate the cycle and tick, we
>         // perform the calculations in terms of relative cycles to
>         // allow changes to the clock period in the future
>         Cycles elapsedCycles(divCeil(curTick() - tick, clockPeriod()));
>         cycle += elapsedCycles;
>         tick += elapsedCycles * clockPeriod();
>     }
> 
>   This part of update() is only executed when a clocked object has been
>   inactive and has not updated its tick for a while. The code updates the tick
>   using clockPeriod() -- basically it's assuming that the clock period has not
>   changed since it went to sleep.
> 
>   In the hello world example, the dcache goes out of sync with the system. You
>   can see this by adding this printout.
> 
>   In src/sim/clocked_object.hh::
> 
>     inline Tick clockEdge(Cycles cycles = Cycles(0)) const
>     {
>         (update ...)
> 
>         std::cout << name() << " tick is ";
>         std::cout << std::dec << tick << std::endl;
> 
>         (return ...)
>     }
> 
>   If you compile and re-simulate and go to those same two instructions we
>   looked at earlier...
> 
>   Without patch::
> 
>     10073967: system.cpu T0 : @_int_malloc+992    : srl        r9, r21        
>  : IntAlu :  D=0x0000000000000000
>     system.cpu.dcache tick is 10075945
>     system.cpu.dcache tick is 10075945
>     system.cpu.dcache tick is 10075945
>     system.cpu tick is 10073967
>     system.cpu tick is 10075966
>     system.cpu tick is 10077965
>     10079964: system.cpu T0 : @_int_malloc+996    : lw         r25, 
> -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac
> 
>   The CPU's lw graduates at tick 10079964. Meanwhile dcache has tick 10075945.
>   This is a difference of 4019, which is not a multiple of the 1999 ps clock
>   period. This mismatch causes the L1 hit latency to increase from 2 cycles to
>   3 cycles.
> 
>   With patch::
> 
>     10071968: system.cpu T0 : @_int_malloc+992    : srl        r9, r21        
>  : IntAlu :  D=0x0000000000000000
>     system.cpu.dcache tick is 10071968
>     system.cpu.dcache tick is 10071968
>     system.cpu.dcache tick is 10071968
>     system.cputick is 10071968
>     system.cputick is 10073967
>     10075966: system.cpu T0 : @_int_malloc+996    : lw         r25, 
> -32740(r28) : MemRead :  D=0x0000000010000000 A=0x10004fac
> 
>   The CPU's lw graduates at tick 10075966. Meanwhile dcache has tick 10071968.
>   This is a difference of 3998, which is exactly 2 cycles. Since the cache and
>   CPU are together, the L1 hit latency stays at 2 cycles even after frequency
>   scaling.
> 
> 
> Thanks,
> 
> Christopher Torng
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to