Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Gabriel Michael Black Mon, 04 May 2009 13:45:01 -0700

My little diagram was missing a few "*"s. Here's a corrected version.  
The "*"s after Exec and completeAcc are for faults that would happen  
on the way back to PreInst.


Gabe

(PreInst)-*->(Fetch)/--------------------------->\/-(Exec)*
                     \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*->

(DTLB)-*|->(D cache)-|->(completeAcc)*


Quoting Gabriel Michael Black <[email protected]>:

> Quoting Steve Reinhardt <[email protected]>:
>
>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote:
>>
>>> While this does avoid the segfault, it also causes some other bug which
>>> crashes just about any of the simple timing regressions. I hadn't
>>> actually tried any of the quick regressions when I sent that out since
>>> my other testing had tricked me into thinking everything was fine. I
>>> think it has something to do with faulting accesses not dealing with the
>>> fault right away and instead continuing into the remainder of
>>> completeIfetch. Rather than try to bandaid this into working, I'm
>>> thinking I'll just going to go for it and try to see what reorganizing
>>> the code buys me.
>>>
>>
>> It seems like anything that uses the timing-mode translation would have to
>> be prepared to not know whether a translation succeeds or not until a later
>> event is scheduled.... are you saying that this change exposes a fundamental
>> problem in the structure of the simple timing cpu with regard to how it
>> deals with timing-mode translation?  That's what it sounds like to me, but I
>> just wanted to clarify.
>>
>> Thanks,
>>
>> Steve
>>
>
> Fundemental is probably too strong a word. Ingrained is probably
> better. The simple timing CPU is now pretty different from how it
> started life and I think that shows a bit. It's been split off of
> atomic, has delayable translation, microcode, unaligned accesses,
> variable instruction sizes, memory mapped registers, and there may be
> other things I'm forgetting. Those things have been folded in and are
> working, but I think a lot of the complexity is that the code wasn't
> designed to accomadate them originally.
>
> This is actually a good opportunity to discuss how the timing CPU is
> put together and what I'd like to do with it. To start, this is what
> the lifetime of an instruction looks like. Points where the flow may
> be delayed using the event queue are marked with "|". Points where the
> flow may be halted by a fault are marked with a "*". This will
> probably also look like garbage without fixed width fonts.
>
> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
>
> (DTLB)-*|->(D cache)-|->(completeAcc)
>
> The problem we started with is from initiateAcc going directly into
> the DTLB portion without finish. Generally, we can run into problems
> where we can go through this process avoiding all the "|"s or by
> coincidence not delaying on them and get farther and farther ahead of
> ourselves and/or build up a deeper and deeper pile of cruft on the
> stack. If a macroop is implemented, for some reason, to loop around
> and around inside itself waiting for, for instance, an interrupt to
> happen, all "|"s would be skipped and the call stack would build until
> it overflowed. What I would like to do, then, is structure the code so
> that calls never venture too far from their origin and return home
> before starting the next task.
>
> To get there, there are several types of control flow to consider.
> 1. The end of the instruction where control loops back to PreInst
> (which checks for interrupts and pc related events)
> 2. A fault which is invoked and returns to PreInst.
> 3. A potential delay which doesn't happen which needs to fall back to
> the flow so that it can continue to the next step.
> 4. A potential delay which -does- happen which needs to fall back to
> the flow and then fall out of it so that the delay can happen in the
> event queue.
> 5. The flow being continued because whatever the CPU was waiting for
> has happened.
>
> As I said, the way that this works now is that each step calls the
> next if it should happen immediately, and otherwise the callback after
> the delay starts things up again. That has the nice property of
> localizing a lot of things to the point where they're relevant, like
> checking for interrupts, and that the different pieces can be started
> whenever is convenient. I've talked about the problems at length.
>
> Instead of having every part call the following parts, what I'd like
> to do instead is have a function which can be stopped and started will
> and which calls all the component operations as child peers.
> Unfortunately, it's been really difficult coming up with something
> that can efficiently do and which provides an efficient mechanism for
> all the possible forms of control flow I listed above.
>
> One idea I had was to set up a switch statement where each phase of
> the execution flow was a case. Cases would not have breaks between
> them so that if execution should continue it would flow right into the
> next. The individual phases could be skipped to directly to allow
> restarting things after some sort of delay.
>
> There are three major problems with this approach though. First, the
> execution flow as shown is not linear, so it can't be implemented
> directly as a single chain of events with no control flow. Second, it
> pulls decisions away from where they'd be made locally, ie checking
> whether to actually do a fetch for whatever reason from where the
> fetch would start. Third, it provides no easy way to stop in the
> middle of things to handle a fault without constantly checking if
> there's one to deal with.
>
> In order to allow faults I was thinking of some sort of try/catch
> mechanism, but that just seems ugly.
>
> The point of all this is, I think the way the CPU is build is broken
> as a result of significant feature creep. I think conceptually my way
> is better, but I'm having a hard time figuring out how to actually
> implement it without it turning into a big ugly mess. If anybody has a
> suggestion for how to make this work, please let me know.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>


_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Reply via email to