Yes, you guys are right. This is a recognized problem, and I've made some changes over time which should make it easier to do this like a real x86 CPU would. I haven't yet, but it's on the horizon. I tend to be very busy, although circumstances may mean I have a little more or less time than normal for a little while so I don't know for sure when I'll get it fixed. If you have an idea of how to get it to do what you want locally, feel free. That will get you going, and when I get it fixed for real then you can start using that.
Gabe On 04/05/12 17:18, Watanabe, Yasuko wrote: > Nilay, > > I agree with you. I think the dependencies of those flag bits should be > evaluated at bit level. > > Gabe and others, > > This change seems invasive. Do you know the best way to handle this? > > Yasuko > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf > Of Nilay Vaish > Sent: Thursday, April 05, 2012 3:35 AM > To: gem5 Developer List > Subject: Re: [gem5-dev] Data dependency caused by flags > > The code for the function genFlags() in src/arch/x86/insts/microregop.cc > suggests that the values of flag bits not updated by the ADD instruction need > to be retained. This means that the previous values need to be read and > written again, which means the second ADD can be dependent on a value written > by the first ADD. If the dependencies were evaulated at bit level, then these > instructions would not be dependent. > > -- > Nilay > > On Thu, 5 Apr 2012, Watanabe, Yasuko wrote: > >> I ran O3 CPU in FS mode in x86 with a simple microbenchmark and got a >> much lower IPC than the theoretical IPC. The issue seems to be data >> dependencies caused by (control) flags, not registers, and I am >> wondering if anyone has come across the same issue. >> >> The microbenchmark has many data independent ADD instructions >> (http://repo.gem5.org/gem5/file/570b44fe6e04/src/arch/x86/isa/insts/ge >> neral_purpose/arithmetic/add_and_subtract.py#l41) >> in a loop. On a 2-wide out-of-order machine with enough resources, the >> IPC should be two at a steady stated. However, the IPC only goes up to >> one. What is happening is that even though the ADDs have two source >> and one destination registers and a flag to set in x86, gem5 adds one >> extra flag source register to the ADDs. As a result, each ADD becomes >> dependent on the earlier ADD's destination flag, constraining the >> achievable IPC to one. >> >> Here is an example sequence with physical register mappings: >> ADD: S1=98, S2=9, S3=2, D1=82, D2=105 (flag) >> ADD: S1=92, S2=9, S3=105 (flag), D1=79, D2=90 ... >> >> Physical registers 98, 9, and 92 are ready when those two ADDs are >> renamed; however, as you can see, the second ADD has to wait for the >> first ADD because of the extra flag source register S3. When I removed >> those flags in the macroop definition, the IPC jumped up from 1 to 1.7. >> >> Does anyone know why the ADD has to read the flags, even though the >> x86 manual does not say that? Those flags should just cause >> write-after-write dependency, not read-after-write. >> >> Yasuko >> >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
