Re: [gem5-users] simulation methodology

hanfeng QIN Sun, 16 Dec 2012 03:09:28 -0800

I am sorry for that we have diverse opinions. I think firstly you shouldrefer to this paper to understand the simulation methodology I mentioned.

Zhan, D. Locality & Utility Co-optimization for Practical CapacityManagement of Shared Last Level Caches. ICS'12


For your convenience, I extract the simulation methodology here.

"In the experiments, all threads under a given workload are executedstarting from a checkpoint that has already had the first 10 billioninstructions bypassed. They are cache-warmed with 1 billion instructionsand then simulated in detail until all threads finish another 1 billioninstructions. Performance statistics are reported for a thread when itreaches 1 billion instructions. If one thread completes the 1 billioninstructions before others, it continues to run so as to still competefor the SLLC capacity, but its extra instructions are not taken intoaccount in the final performance report. This is in conformation withthe standard practice in CMP cache research"

For the 1st question, I do not insist on exactly N3 instructions at all.Actually, it is not possible to count exact instruction numbers. But tokeep consistent with the above simulation methodology, I have to enforceeach core execute at least N3 instructions. I reviewed the currentlimplementation of the option '-I' in configs/common/Simulation.py andsrc/cpu/base.cc. It just passed the '-I' value tocpu[i].MAX_INSTS_ANY_THREAD. In this case it only guarantees that itexits the simulation if one core commits N3 instructions no matter howmany instructions retired from the other cores.

For the 2nd question, considering that some programs may finish N3instruction before the others, if we only run total N3 instruction forwhole programs and report stats after N3 instructions, I don't think thestats can mirror the real impact of shared resource contention sinceduring some phases no contention exists at all. On the other hand, byenforcing every program enforced to run N3 * 2 instructions, we have thechance to report stats after the first N3 instructions thus the statscan reflect the impact due to shared resource contention.

Of course, I think the above simulation methodology still has pitfalls.For some program with short lifetime, even executed N3 * 2 instructions,we still can not guarantee it will contend shared resource with otherprograms.

I implement this methodology in gem5. For M multiprogrammed workload, itdumps (M + 1) stats as expected. But now I do not obtain the dump orderto extract core information. E.g., if the dump order is c1->c2->c0->c3,then we get stats related to c0 in the 1st dump and that to c2 in the2nd dump and so on. Here we must note that the stats dump order mirrorsthe order in which every program finishes N3 instructions.



Thanks,
Hanfeng


On 12/14/2012 06:52 PM, Nilay Vaish wrote:

On Fri, 14 Dec 2012, hanfeng QIN wrote:
I know the options '-F' and '-W'. Actualy, I use them together with'-I' option to specify the detailed instruction numbers (as denotedwith N3 in my previous mail). It seems that the defaultimplementation in configs/common/Simulation.py will pass the N3 tocpu[i].MAX_INSTS_ANY_THREAD. Thus, when any program finishes N3instructions, the total simulation will exit. Obviously, in this caseI modify this default implementaion by passing N3 tocpu[i].MAX_INSTS_ALL_THREADS, which will force each program to commitat least N3 instructions. Then the final total instruction simulatedwill be N3 * Nr_cores. But this approach has a pitfall compared withthe methodology I referred. For multi-programmed workload, once someprogram finishes N3 instructions, the corresponding core will have notask to schedule ( I assume the number of workload will be no morethan available cores simulated). Thus, it may be not reasonable toevaluate its impact on shared resource contention according to finalstatistics report.
I don't understand why you are so insistent that every core has toexecute exactly N3 number of instructions. A much more realisticexperiment would be one where each core has executed at least N3instructions. If you understand how the option -I has beenimplemented, it should be straight forward for you to modify gem5 anddump stats when all cores have executed at least N3 instructions.
Based on this, I have an idea to report statistics more reasonable.Can we carry out detailed simulated N3 * 2 instructions for eachprogram (thus total instruction simulated will be (N3 * 2) *Nr_cores) but only dump the stats after the first N3 instructions?But I am not clear on the stats dump internals.
I don't see why executing twice the number of instructions would makeany difference. Depending on the latencies in the system, the ratio ofIPC's of two cores can be very low/high.
I would rather suggest that you think more about the experiment youare proposing. Why is it essential that each core has executed exactlyN3 instructions? Is this experiment realistic?
--
Nilay


_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] simulation methodology

Reply via email to