I've only thought about this briefly, but here are a few quick reactions: - PowerPC has updating ld/st instructions too. How are these handled? Whatever we do, we should do the same thing for both.
- How would a real pipeline handle this, particularly for loads? In that case you have two register updates from one instruction, and assuming the datapath is designed for the typical case of one register update per instruction, you may well end up doing something that looks like a poor man's micro-ops anyway (e.g., it seems likely that it would occupy two slots in the writeback stage). And if you want to decouple those two writebacks as much as possible in O3, it's not clear how you'll do that w/o either creating two micro-ops or significantly revamping the cpu/inst interface. So it's not obvious to me that that solution is overkill. - The whole initiateAcc/completeAcc thing came about when we started doing true timing memory accesses; before that we just had the memAccInst->execute() thing that expected to call read() or write() and get an answer back right away. So if we want to make split-phase translation completely visible to the ISA then perhaps we do need to revamp the interface similarly. But I think the issue here is not making the split-phase nature of translation visible, but rather enabling one part of the instruction to writeback speculatively while another part of the instruction is still outstanding and could fault, which again sounds to me a lot like two independent micro-ops. Steve On Sun, Jun 6, 2010 at 8:10 AM, Ali Saidi <[email protected]> wrote: > The problem we are having is something like this. With ARM there are some > incrementing loads/stores that do the following: > addr_tmp = Rm + Rn > Rt = MemoryLoad(addr_tmp) > Rn = addr_tmp > > In the case of the atomic simple cpu this works fine because the Rn = addr > is protected by a if (fault == NoFault) from the MemoryLoad. > > For the timing CPUs, all of the above code is in the initateAcc(). With the > new delayed translation code MemoryLoad() will always return NoFault so the > Rn = addr code will always be executed. With the O3 this appears like it will > work since upon taking a fault on the instruction, none of updated state will > be saved, however with the timing simple cpu there isn't any duplicate state, > so the Rn = addr is executed and later when the timing translation completes > and takes a fault Rn is already updated and after the fault is handled > and the instruction is re-executed it doesn't work so well. > > There seem to be three solutions. > 1) Break the instruction into micro-ops, one that does the load and the other > that does the writing of Rn. This is Ok, but it really seems like overkill. > 2) Move the Rn = addr part to completeAcc(). This isn't at all ideal because > while it would solve the problem for the timing simple cpu, it would make a > tight loop without and loop carried dependencies only able to issue one > outstanding load at a time with the o3 cpu. > 3) Some how restructure the timing simple cpu code to do something more like > the way the out-of-order cpu handles these issues, although I don't know > about the implication of keeping around temporary state. Something along > these lines seems ideal, since I don't think we really want completely > different methods to handle these issues in the different CPU classes. > 3a) One way to handle this might be instead of having the translation return > NoFault when the transaction is delayed, return ReplayFault or something > similar that would re-execute initateAcc(). This time the translation should > be cached and thus immediately available so it can return the proper fault or > NoFault. > > Anyone have some comments? > > Thanks, > Ali > > > On Jun 5, 2010, at 8:10 PM, Steve Reinhardt wrote: > >> I'd have to look more closely to decide... I like having the common >> parts of simple-atomic and simple-timing factored out into the base >> class, so I'd hate to just completely throw that away unless it's >> really justified, but it's quite possible that the partition/interface >> between the base class and the derived classes needs to be rethought. >> >> Steve >> >> On Sat, Jun 5, 2010 at 5:18 PM, Gabriel Michael Black >> <[email protected]> wrote: >>> Maybe it should be separated from the simple atomic CPU again? That would >>> have made sense back when they were mostly the same, but they might be >>> different enough now that it just adds to the confusion. I'm not advocating >>> that necessarily, but it sounds like it could help. >>> >>> Gabe >>> >>> Quoting Steve Reinhardt <[email protected]>: >>> >>>> It could use some cleanup. The idea has always been that it's the >>>> simplest CPU that can drive the timing memory system, so there's no >>>> guarantee it will be simple in an absolute sense. But I'm sure it >>>> could be simpler if we worked at it. I think the combination of >>>> dealing with unaligned accesses and split-phase translations has >>>> definitely hit it hard. >>>> >>>> Steve >>>> >>>> On Sat, Jun 5, 2010 at 1:45 PM, Ali Saidi <[email protected]> wrote: >>>>> >>>>> My comment was more that we might want to re-evaluate the timing simple >>>>> cpu as it's left the simple bit a while ago. Many of the fault == NoFault >>>>> checks don't matter with the translation state change since it's always >>>>> going to return no fault. >>>>> >>>>> >>>>> I was hoping to get a discussion going about what if anything we should >>>>> do. >>>>> >>>>> Ali >>>>> >>>>> On Jun 4, 2010, at 5:29 PM, nathan binkert wrote: >>>>> >>>>>>> On a somewhat related note, it seems like the timing simple cpu control >>>>>>> flow has gotten pretty messy with the addition of the translation state >>>>>>> code. To figure out what is going on you really need a large white >>>>>>> board >>>>>>> and a lot of time to map out the call tree and what happens on faults >>>>>>> and >>>>>>> the like. >>>>>> >>>>>> Ali, >>>>>> >>>>>> If it is correct, can you commit the patch? If you're not the right >>>>>> person, Gabe? >>>>>> >>>>>> Nate >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> >>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
