On Sun, 16 Dec 2012, hanfeng QIN wrote:

I am sorry for that we have diverse opinions. I think firstly you should refer to this paper to understand the simulation methodology I mentioned.


I don't think you have any opinion. If you had one, you would clearly stated why you believe the experiment you want to conduct makes sense. You are just trying to do what some one else has done.

Zhan, D. Locality & Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches. ICS'12

Since you know whose methodology you are trying to replicate, it is advisable that you contact the author(s) directly as to what exactly they did. In fact, since the author(s) used M5 simulator, it should be straight forward to replicate the changes that might have been made to the simulator.


For your convenience, I extract the simulation methodology here.

"In the experiments, all threads under a given workload are executed starting from a checkpoint that has already had the first 10 billion instructions

It is not clear whether the 10 billion instructions is the sum over all the threads, or that each individual thread had executed 10 billion instructions.

bypassed. They are cache-warmed with 1 billion instructions and then

Again, it is not clear if 1 billion instructions is across all threads, or for an individual thread.

simulated in detail until all threads finish another 1 billion instructions. Performance statistics are reported for a thread when it reaches 1 billion instructions. If one thread completes the 1 billion instructions before others, it continues to run so as to still compete for the SLLC capacity, but its extra instructions are not taken into account in the final performance report. This is in conformation with the standard practice in CMP cache research"

From this paragraph, I can infer that each thread would have executed 1
billion instructions after the cache warm-up phase. After reading the section 5.4 from the thesis of the author named above, it seems to me that when a hardware thread completed a billion instructions for the first time, he noted the IPC for that thread. Finally, these recorded IPCs were summed up and used for comparison of different cache replacement policies.


For the 1st question, I do not insist on exactly N3 instructions at all. Actually, it is not possible to count exact instruction numbers. But to keep consistent with the above simulation methodology, I have to enforce each core execute at least N3 instructions. I reviewed the currentl implementation of the option '-I' in configs/common/Simulation.py and src/cpu/base.cc. It just passed the '-I' value to cpu[i].MAX_INSTS_ANY_THREAD. In this case it only guarantees that it exits the simulation if one core commits N3 instructions no matter how many instructions retired from the other cores.

For the 2nd question, considering that some programs may finish N3 instruction before the others, if we only run total N3 instruction for whole programs and report stats after N3 instructions, I don't think the stats can mirror the real impact of shared resource contention since during some phases no contention exists at all. On the other hand, by enforcing every program enforced to run N3 * 2 instructions, we have the chance to report stats after the first N3 instructions thus the stats can reflect the impact due to shared resource contention.

Of course, I think the above simulation methodology still has pitfalls. For some program with short lifetime, even executed N3 * 2 instructions, we still can not guarantee it will contend shared resource with other programs.

I implement this methodology in gem5. For M multiprogrammed workload, it dumps (M + 1) stats as expected. But now I do not obtain the dump order to extract core information. E.g., if the dump order is c1->c2->c0->c3, then we get stats related to c0 in the 1st dump and that to c2 in the 2nd dump and so on. Here we must note that the stats dump order mirrors the order in which every program finishes N3 instructions.



It is not too hard to print when a thread has executed a billion instructions.

--
Nilay
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to