Hi all (specifically Gabe), I was trying to run some tests that use the RDTSCP instruction and I found that the rdtsc micro op's current implementation isn't quite serializing enough. From the Intel manual:
The RDTSCP instruction is not a serializing instruction, but it does wait until all previous instructions have executed and all previous loads are globally visible. But it does not wait for previous stores to be globally visible, and subsequent instructions may begin execution before the read operation is performed. The following items may guide software seeking to order executions of RDTSCP: • If software requires RDTSCP to be executed only after all previous stores are globally visible, it can execute MFENCE immediately before RDTSCP. • If software requires RDTSCP to be executed prior to execution of any subsequent instruction (including any memory accesses), it can execute LFENCE immediately after RDTSCP. This sounds like the microop should be "serializing before" in gem5's parlance. I believe that the two instructions RDTSC and RDTSCP have the same semantics, but that is not clearly stated in the instruction manual. I don't see any reason not to implement them the same in gem5. Correct me if I'm wrong. In testing, I found that making the macro-op serializing doesn't work because it only serializes the final instruction and the TSC has already been read. For instance if you have the following code sequence: rdtscp load miss rdtscp The difference in the two counters is ~load miss time on real hardware. In gem5, the difference is ~4 cycles. I've found that this can be fixed by adding the following code to the RDTSC micro-op implementation generated by the ISA description in decode-ns.cc.inc. flags[IsSerializeBefore] = true ; After this change, gem5 reports numbers closer to real hardware. I can't figure out the "right" way to get this code generated, though! I assume I need to somehow change the rdstc micro-op definition in regop.isa. Any help would be greatly appreciated! Other quick question: RDTSCP is supposed to return the CPU number along with the TSC value. Any hints has to how to get this from the ISA language? Would the best way be to add a new micro-op for this? Thanks, Jason _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
