The problem we are having is something like this. With ARM there are some incrementing loads/stores that do the following: addr_tmp = Rm + Rn Rt = MemoryLoad(addr_tmp) Rn = addr_tmp
In the case of the atomic simple cpu this works fine because the Rn = addr is protected by a if (fault == NoFault) from the MemoryLoad. For the timing CPUs, all of the above code is in the initateAcc(). With the new delayed translation code MemoryLoad() will always return NoFault so the Rn = addr code will always be executed. With the O3 this appears like it will work since upon taking a fault on the instruction, none of updated state will be saved, however with the timing simple cpu there isn't any duplicate state, so the Rn = addr is executed and later when the timing translation completes and takes a fault Rn is already updated and after the fault is handled and the instruction is re-executed it doesn't work so well. There seem to be three solutions. 1) Break the instruction into micro-ops, one that does the load and the other that does the writing of Rn. This is Ok, but it really seems like overkill. 2) Move the Rn = addr part to completeAcc(). This isn't at all ideal because while it would solve the problem for the timing simple cpu, it would make a tight loop without and loop carried dependencies only able to issue one outstanding load at a time with the o3 cpu. 3) Some how restructure the timing simple cpu code to do something more like the way the out-of-order cpu handles these issues, although I don't know about the implication of keeping around temporary state. Something along these lines seems ideal, since I don't think we really want completely different methods to handle these issues in the different CPU classes. 3a) One way to handle this might be instead of having the translation return NoFault when the transaction is delayed, return ReplayFault or something similar that would re-execute initateAcc(). This time the translation should be cached and thus immediately available so it can return the proper fault or NoFault. Anyone have some comments? Thanks, Ali On Jun 5, 2010, at 8:10 PM, Steve Reinhardt wrote: > I'd have to look more closely to decide... I like having the common > parts of simple-atomic and simple-timing factored out into the base > class, so I'd hate to just completely throw that away unless it's > really justified, but it's quite possible that the partition/interface > between the base class and the derived classes needs to be rethought. > > Steve > > On Sat, Jun 5, 2010 at 5:18 PM, Gabriel Michael Black > <[email protected]> wrote: >> Maybe it should be separated from the simple atomic CPU again? That would >> have made sense back when they were mostly the same, but they might be >> different enough now that it just adds to the confusion. I'm not advocating >> that necessarily, but it sounds like it could help. >> >> Gabe >> >> Quoting Steve Reinhardt <[email protected]>: >> >>> It could use some cleanup. The idea has always been that it's the >>> simplest CPU that can drive the timing memory system, so there's no >>> guarantee it will be simple in an absolute sense. But I'm sure it >>> could be simpler if we worked at it. I think the combination of >>> dealing with unaligned accesses and split-phase translations has >>> definitely hit it hard. >>> >>> Steve >>> >>> On Sat, Jun 5, 2010 at 1:45 PM, Ali Saidi <[email protected]> wrote: >>>> >>>> My comment was more that we might want to re-evaluate the timing simple >>>> cpu as it's left the simple bit a while ago. Many of the fault == NoFault >>>> checks don't matter with the translation state change since it's always >>>> going to return no fault. >>>> >>>> >>>> I was hoping to get a discussion going about what if anything we should >>>> do. >>>> >>>> Ali >>>> >>>> On Jun 4, 2010, at 5:29 PM, nathan binkert wrote: >>>> >>>>>> On a somewhat related note, it seems like the timing simple cpu control >>>>>> flow has gotten pretty messy with the addition of the translation state >>>>>> code. To figure out what is going on you really need a large white >>>>>> board >>>>>> and a lot of time to map out the call tree and what happens on faults >>>>>> and >>>>>> the like. >>>>> >>>>> Ali, >>>>> >>>>> If it is correct, can you commit the patch? If you're not the right >>>>> person, Gabe? >>>>> >>>>> Nate >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
