Re: [gem5-users] simulation methodology

Nilay Vaish Sun, 16 Dec 2012 08:03:16 -0800

On Sun, 16 Dec 2012, hanfeng QIN wrote:

I am sorry for that we have diverse opinions. I think firstly you shouldrefer to this paper to understand the simulation methodology I mentioned.

I don't think you have any opinion. If you had one, you would clearlystated why you believe the experiment you want to conduct makes sense. Youare just trying to do what some one else has done.

Zhan, D. Locality & Utility Co-optimization for Practical Capacity Managementof Shared Last Level Caches. ICS'12

Since you know whose methodology you are trying to replicate, it isadvisable that you contact the author(s) directly as to what exactly theydid. In fact, since the author(s) used M5 simulator, it should be straightforward to replicate the changes that might have been made to thesimulator.

For your convenience, I extract the simulation methodology here.
"In the experiments, all threads under a given workload are executed startingfrom a checkpoint that has already had the first 10 billion instructions

It is not clear whether the 10 billion instructions is the sum over allthe threads, or that each individual thread had executed 10 billioninstructions.

bypassed. They are cache-warmed with 1 billion instructions and then

Again, it is not clear if 1 billion instructions is across all threads, orfor an individual thread.

simulated in detail until all threads finish another 1 billion instructions.Performance statistics are reported for a thread when it reaches 1 billioninstructions. If one thread completes the 1 billion instructions beforeothers, it continues to run so as to still compete for the SLLC capacity, butits extra instructions are not taken into account in the final performancereport. This is in conformation with the standard practice in CMP cacheresearch"

From this paragraph, I can infer that each thread would have executed 1

billion instructions after the cache warm-up phase. After reading thesection 5.4 from the thesis of the author named above, it seems to me thatwhen a hardware thread completed a billion instructions for the firsttime, he noted the IPC for that thread. Finally, these recorded IPCs weresummed up and used for comparison of different cache replacement policies.

For the 1st question, I do not insist on exactly N3 instructions at all.Actually, it is not possible to count exact instruction numbers. But to keepconsistent with the above simulation methodology, I have to enforce each coreexecute at least N3 instructions. I reviewed the currentl implementation ofthe option '-I' in configs/common/Simulation.py and src/cpu/base.cc. It justpassed the '-I' value to cpu[i].MAX_INSTS_ANY_THREAD. In this case it onlyguarantees that it exits the simulation if one core commits N3 instructionsno matter how many instructions retired from the other cores.
For the 2nd question, considering that some programs may finish N3instruction before the others, if we only run total N3 instruction for wholeprograms and report stats after N3 instructions, I don't think the stats canmirror the real impact of shared resource contention since during some phasesno contention exists at all. On the other hand, by enforcing every programenforced to run N3 * 2 instructions, we have the chance to report stats afterthe first N3 instructions thus the stats can reflect the impact due to sharedresource contention.
Of course, I think the above simulation methodology still has pitfalls. Forsome program with short lifetime, even executed N3 * 2 instructions, we stillcan not guarantee it will contend shared resource with other programs.
I implement this methodology in gem5. For M multiprogrammed workload, itdumps (M + 1) stats as expected. But now I do not obtain the dump order toextract core information. E.g., if the dump order is c1->c2->c0->c3, then weget stats related to c0 in the 1st dump and that to c2 in the 2nd dump and soon. Here we must note that the stats dump order mirrors the order in whichevery program finishes N3 instructions.

It is not too hard to print when a thread has executed a billioninstructions.


--
Nilay
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] simulation methodology

Reply via email to