I ran O3 CPU in FS mode in x86 with a simple microbenchmark and got a much 
lower IPC than the theoretical IPC. The issue seems to be data dependencies 
caused by (control) flags, not registers, and I am wondering if anyone has come 
across the same issue.

The microbenchmark has many data independent ADD instructions 
(http://repo.gem5.org/gem5/file/570b44fe6e04/src/arch/x86/isa/insts/general_purpose/arithmetic/add_and_subtract.py#l41)
 in a loop. On a 2-wide out-of-order machine with enough resources, the IPC 
should be two at a steady stated. However, the IPC only goes up to one. What is 
happening is that even though the ADDs have two source and one destination 
registers and a flag to set in x86, gem5 adds one extra flag source register to 
the ADDs. As a result, each ADD becomes dependent on the earlier ADD's 
destination flag, constraining the achievable IPC to one.

Here is an example sequence with physical register mappings:
ADD: S1=98, S2=9, S3=2, D1=82, D2=105 (flag)
ADD: S1=92, S2=9, S3=105 (flag), D1=79, D2=90
...

Physical registers 98, 9, and 92 are ready when those two ADDs are renamed; 
however, as you can see, the second ADD has to wait for the first ADD because 
of the extra flag source register S3. When I removed those flags in the macroop 
definition, the IPC jumped up from 1 to 1.7.

Does anyone know why the ADD has to read the flags, even though the x86 manual 
does not say that? Those flags should just cause write-after-write dependency, 
not read-after-write.

Yasuko

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to