Nilay, I agree with you. I think the dependencies of those flag bits should be evaluated at bit level.
Gabe and others, This change seems invasive. Do you know the best way to handle this? Yasuko -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Nilay Vaish Sent: Thursday, April 05, 2012 3:35 AM To: gem5 Developer List Subject: Re: [gem5-dev] Data dependency caused by flags The code for the function genFlags() in src/arch/x86/insts/microregop.cc suggests that the values of flag bits not updated by the ADD instruction need to be retained. This means that the previous values need to be read and written again, which means the second ADD can be dependent on a value written by the first ADD. If the dependencies were evaulated at bit level, then these instructions would not be dependent. -- Nilay On Thu, 5 Apr 2012, Watanabe, Yasuko wrote: > I ran O3 CPU in FS mode in x86 with a simple microbenchmark and got a > much lower IPC than the theoretical IPC. The issue seems to be data > dependencies caused by (control) flags, not registers, and I am > wondering if anyone has come across the same issue. > > The microbenchmark has many data independent ADD instructions > (http://repo.gem5.org/gem5/file/570b44fe6e04/src/arch/x86/isa/insts/ge > neral_purpose/arithmetic/add_and_subtract.py#l41) > in a loop. On a 2-wide out-of-order machine with enough resources, the > IPC should be two at a steady stated. However, the IPC only goes up to > one. What is happening is that even though the ADDs have two source > and one destination registers and a flag to set in x86, gem5 adds one > extra flag source register to the ADDs. As a result, each ADD becomes > dependent on the earlier ADD's destination flag, constraining the > achievable IPC to one. > > Here is an example sequence with physical register mappings: > ADD: S1=98, S2=9, S3=2, D1=82, D2=105 (flag) > ADD: S1=92, S2=9, S3=105 (flag), D1=79, D2=90 ... > > Physical registers 98, 9, and 92 are ready when those two ADDs are > renamed; however, as you can see, the second ADD has to wait for the > first ADD because of the extra flag source register S3. When I removed > those flags in the macroop definition, the IPC jumped up from 1 to 1.7. > > Does anyone know why the ADD has to read the flags, even though the > x86 manual does not say that? Those flags should just cause > write-after-write dependency, not read-after-write. > > Yasuko > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
