If nobody has anything to say about the issue itself, letting me know which part of my ramblings is the least comprehensible would also be helpful.
Gabe Gabriel Michael Black wrote: > My little diagram was missing a few "*"s. Here's a corrected version. > The "*"s after Exec and completeAcc are for faults that would happen > on the way back to PreInst. > > Gabe > > (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)* > \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*-> > > (DTLB)-*|->(D cache)-|->(completeAcc)* > > > Quoting Gabriel Michael Black <[email protected]>: > > >> Quoting Steve Reinhardt <[email protected]>: >> >> >>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote: >>> >>> >>>> While this does avoid the segfault, it also causes some other bug which >>>> crashes just about any of the simple timing regressions. I hadn't >>>> actually tried any of the quick regressions when I sent that out since >>>> my other testing had tricked me into thinking everything was fine. I >>>> think it has something to do with faulting accesses not dealing with the >>>> fault right away and instead continuing into the remainder of >>>> completeIfetch. Rather than try to bandaid this into working, I'm >>>> thinking I'll just going to go for it and try to see what reorganizing >>>> the code buys me. >>>> >>>> >>> It seems like anything that uses the timing-mode translation would have to >>> be prepared to not know whether a translation succeeds or not until a later >>> event is scheduled.... are you saying that this change exposes a fundamental >>> problem in the structure of the simple timing cpu with regard to how it >>> deals with timing-mode translation? That's what it sounds like to me, but I >>> just wanted to clarify. >>> >>> Thanks, >>> >>> Steve >>> >>> >> Fundemental is probably too strong a word. Ingrained is probably >> better. The simple timing CPU is now pretty different from how it >> started life and I think that shows a bit. It's been split off of >> atomic, has delayable translation, microcode, unaligned accesses, >> variable instruction sizes, memory mapped registers, and there may be >> other things I'm forgetting. Those things have been folded in and are >> working, but I think a lot of the complexity is that the code wasn't >> designed to accomadate them originally. >> >> This is actually a good opportunity to discuss how the timing CPU is >> put together and what I'd like to do with it. To start, this is what >> the lifetime of an instruction looks like. Points where the flow may >> be delayed using the event queue are marked with "|". Points where the >> flow may be halted by a fault are marked with a "*". This will >> probably also look like garbage without fixed width fonts. >> >> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec) >> \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)---> >> >> (DTLB)-*|->(D cache)-|->(completeAcc) >> >> The problem we started with is from initiateAcc going directly into >> the DTLB portion without finish. Generally, we can run into problems >> where we can go through this process avoiding all the "|"s or by >> coincidence not delaying on them and get farther and farther ahead of >> ourselves and/or build up a deeper and deeper pile of cruft on the >> stack. If a macroop is implemented, for some reason, to loop around >> and around inside itself waiting for, for instance, an interrupt to >> happen, all "|"s would be skipped and the call stack would build until >> it overflowed. What I would like to do, then, is structure the code so >> that calls never venture too far from their origin and return home >> before starting the next task. >> >> To get there, there are several types of control flow to consider. >> 1. The end of the instruction where control loops back to PreInst >> (which checks for interrupts and pc related events) >> 2. A fault which is invoked and returns to PreInst. >> 3. A potential delay which doesn't happen which needs to fall back to >> the flow so that it can continue to the next step. >> 4. A potential delay which -does- happen which needs to fall back to >> the flow and then fall out of it so that the delay can happen in the >> event queue. >> 5. The flow being continued because whatever the CPU was waiting for >> has happened. >> >> As I said, the way that this works now is that each step calls the >> next if it should happen immediately, and otherwise the callback after >> the delay starts things up again. That has the nice property of >> localizing a lot of things to the point where they're relevant, like >> checking for interrupts, and that the different pieces can be started >> whenever is convenient. I've talked about the problems at length. >> >> Instead of having every part call the following parts, what I'd like >> to do instead is have a function which can be stopped and started will >> and which calls all the component operations as child peers. >> Unfortunately, it's been really difficult coming up with something >> that can efficiently do and which provides an efficient mechanism for >> all the possible forms of control flow I listed above. >> >> One idea I had was to set up a switch statement where each phase of >> the execution flow was a case. Cases would not have breaks between >> them so that if execution should continue it would flow right into the >> next. The individual phases could be skipped to directly to allow >> restarting things after some sort of delay. >> >> There are three major problems with this approach though. First, the >> execution flow as shown is not linear, so it can't be implemented >> directly as a single chain of events with no control flow. Second, it >> pulls decisions away from where they'd be made locally, ie checking >> whether to actually do a fetch for whatever reason from where the >> fetch would start. Third, it provides no easy way to stop in the >> middle of things to handle a fault without constantly checking if >> there's one to deal with. >> >> In order to allow faults I was thinking of some sort of try/catch >> mechanism, but that just seems ugly. >> >> The point of all this is, I think the way the CPU is build is broken >> as a result of significant feature creep. I think conceptually my way >> is better, but I'm having a hard time figuring out how to actually >> implement it without it turning into a big ugly mess. If anybody has a >> suggestion for how to make this work, please let me know. >> >> Gabe >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > > > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
