Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Gabe Black Mon, 04 May 2009 22:44:40 -0700

If nobody has anything to say about the issue itself, letting me know
which part of my ramblings is the least comprehensible would also be
helpful.


Gabe

Gabriel Michael Black wrote:
> My little diagram was missing a few "*"s. Here's a corrected version.  
> The "*"s after Exec and completeAcc are for faults that would happen  
> on the way back to PreInst.
>
> Gabe
>
> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)*
>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*->
>
> (DTLB)-*|->(D cache)-|->(completeAcc)*
>
>
> Quoting Gabriel Michael Black <[email protected]>:
>
>   
>> Quoting Steve Reinhardt <[email protected]>:
>>
>>     
>>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote:
>>>
>>>       
>>>> While this does avoid the segfault, it also causes some other bug which
>>>> crashes just about any of the simple timing regressions. I hadn't
>>>> actually tried any of the quick regressions when I sent that out since
>>>> my other testing had tricked me into thinking everything was fine. I
>>>> think it has something to do with faulting accesses not dealing with the
>>>> fault right away and instead continuing into the remainder of
>>>> completeIfetch. Rather than try to bandaid this into working, I'm
>>>> thinking I'll just going to go for it and try to see what reorganizing
>>>> the code buys me.
>>>>
>>>>         
>>> It seems like anything that uses the timing-mode translation would have to
>>> be prepared to not know whether a translation succeeds or not until a later
>>> event is scheduled.... are you saying that this change exposes a fundamental
>>> problem in the structure of the simple timing cpu with regard to how it
>>> deals with timing-mode translation?  That's what it sounds like to me, but I
>>> just wanted to clarify.
>>>
>>> Thanks,
>>>
>>> Steve
>>>
>>>       
>> Fundemental is probably too strong a word. Ingrained is probably
>> better. The simple timing CPU is now pretty different from how it
>> started life and I think that shows a bit. It's been split off of
>> atomic, has delayable translation, microcode, unaligned accesses,
>> variable instruction sizes, memory mapped registers, and there may be
>> other things I'm forgetting. Those things have been folded in and are
>> working, but I think a lot of the complexity is that the code wasn't
>> designed to accomadate them originally.
>>
>> This is actually a good opportunity to discuss how the timing CPU is
>> put together and what I'd like to do with it. To start, this is what
>> the lifetime of an instruction looks like. Points where the flow may
>> be delayed using the event queue are marked with "|". Points where the
>> flow may be halted by a fault are marked with a "*". This will
>> probably also look like garbage without fixed width fonts.
>>
>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
>>
>> (DTLB)-*|->(D cache)-|->(completeAcc)
>>
>> The problem we started with is from initiateAcc going directly into
>> the DTLB portion without finish. Generally, we can run into problems
>> where we can go through this process avoiding all the "|"s or by
>> coincidence not delaying on them and get farther and farther ahead of
>> ourselves and/or build up a deeper and deeper pile of cruft on the
>> stack. If a macroop is implemented, for some reason, to loop around
>> and around inside itself waiting for, for instance, an interrupt to
>> happen, all "|"s would be skipped and the call stack would build until
>> it overflowed. What I would like to do, then, is structure the code so
>> that calls never venture too far from their origin and return home
>> before starting the next task.
>>
>> To get there, there are several types of control flow to consider.
>> 1. The end of the instruction where control loops back to PreInst
>> (which checks for interrupts and pc related events)
>> 2. A fault which is invoked and returns to PreInst.
>> 3. A potential delay which doesn't happen which needs to fall back to
>> the flow so that it can continue to the next step.
>> 4. A potential delay which -does- happen which needs to fall back to
>> the flow and then fall out of it so that the delay can happen in the
>> event queue.
>> 5. The flow being continued because whatever the CPU was waiting for
>> has happened.
>>
>> As I said, the way that this works now is that each step calls the
>> next if it should happen immediately, and otherwise the callback after
>> the delay starts things up again. That has the nice property of
>> localizing a lot of things to the point where they're relevant, like
>> checking for interrupts, and that the different pieces can be started
>> whenever is convenient. I've talked about the problems at length.
>>
>> Instead of having every part call the following parts, what I'd like
>> to do instead is have a function which can be stopped and started will
>> and which calls all the component operations as child peers.
>> Unfortunately, it's been really difficult coming up with something
>> that can efficiently do and which provides an efficient mechanism for
>> all the possible forms of control flow I listed above.
>>
>> One idea I had was to set up a switch statement where each phase of
>> the execution flow was a case. Cases would not have breaks between
>> them so that if execution should continue it would flow right into the
>> next. The individual phases could be skipped to directly to allow
>> restarting things after some sort of delay.
>>
>> There are three major problems with this approach though. First, the
>> execution flow as shown is not linear, so it can't be implemented
>> directly as a single chain of events with no control flow. Second, it
>> pulls decisions away from where they'd be made locally, ie checking
>> whether to actually do a fetch for whatever reason from where the
>> fetch would start. Third, it provides no easy way to stop in the
>> middle of things to handle a fault without constantly checking if
>> there's one to deal with.
>>
>> In order to allow faults I was thinking of some sort of try/catch
>> mechanism, but that just seems ugly.
>>
>> The point of all this is, I think the way the CPU is build is broken
>> as a result of significant feature creep. I think conceptually my way
>> is better, but I'm having a hard time figuring out how to actually
>> implement it without it turning into a big ugly mess. If anybody has a
>> suggestion for how to make this work, please let me know.
>>
>> Gabe
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>>     
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Reply via email to