If nobody has anything to say about the issue itself, letting me know
which part of my ramblings is the least comprehensible would also be
helpful.

Gabe

Gabriel Michael Black wrote:
> My little diagram was missing a few "*"s. Here's a corrected version.  
> The "*"s after Exec and completeAcc are for faults that would happen  
> on the way back to PreInst.
>
> Gabe
>
> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)*
>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*->
>
> (DTLB)-*|->(D cache)-|->(completeAcc)*
>
>
> Quoting Gabriel Michael Black <[email protected]>:
>
>   
>> Quoting Steve Reinhardt <[email protected]>:
>>
>>     
>>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote:
>>>
>>>       
>>>> While this does avoid the segfault, it also causes some other bug which
>>>> crashes just about any of the simple timing regressions. I hadn't
>>>> actually tried any of the quick regressions when I sent that out since
>>>> my other testing had tricked me into thinking everything was fine. I
>>>> think it has something to do with faulting accesses not dealing with the
>>>> fault right away and instead continuing into the remainder of
>>>> completeIfetch. Rather than try to bandaid this into working, I'm
>>>> thinking I'll just going to go for it and try to see what reorganizing
>>>> the code buys me.
>>>>
>>>>         
>>> It seems like anything that uses the timing-mode translation would have to
>>> be prepared to not know whether a translation succeeds or not until a later
>>> event is scheduled.... are you saying that this change exposes a fundamental
>>> problem in the structure of the simple timing cpu with regard to how it
>>> deals with timing-mode translation?  That's what it sounds like to me, but I
>>> just wanted to clarify.
>>>
>>> Thanks,
>>>
>>> Steve
>>>
>>>       
>> Fundemental is probably too strong a word. Ingrained is probably
>> better. The simple timing CPU is now pretty different from how it
>> started life and I think that shows a bit. It's been split off of
>> atomic, has delayable translation, microcode, unaligned accesses,
>> variable instruction sizes, memory mapped registers, and there may be
>> other things I'm forgetting. Those things have been folded in and are
>> working, but I think a lot of the complexity is that the code wasn't
>> designed to accomadate them originally.
>>
>> This is actually a good opportunity to discuss how the timing CPU is
>> put together and what I'd like to do with it. To start, this is what
>> the lifetime of an instruction looks like. Points where the flow may
>> be delayed using the event queue are marked with "|". Points where the
>> flow may be halted by a fault are marked with a "*". This will
>> probably also look like garbage without fixed width fonts.
>>
>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
>>
>> (DTLB)-*|->(D cache)-|->(completeAcc)
>>
>> The problem we started with is from initiateAcc going directly into
>> the DTLB portion without finish. Generally, we can run into problems
>> where we can go through this process avoiding all the "|"s or by
>> coincidence not delaying on them and get farther and farther ahead of
>> ourselves and/or build up a deeper and deeper pile of cruft on the
>> stack. If a macroop is implemented, for some reason, to loop around
>> and around inside itself waiting for, for instance, an interrupt to
>> happen, all "|"s would be skipped and the call stack would build until
>> it overflowed. What I would like to do, then, is structure the code so
>> that calls never venture too far from their origin and return home
>> before starting the next task.
>>
>> To get there, there are several types of control flow to consider.
>> 1. The end of the instruction where control loops back to PreInst
>> (which checks for interrupts and pc related events)
>> 2. A fault which is invoked and returns to PreInst.
>> 3. A potential delay which doesn't happen which needs to fall back to
>> the flow so that it can continue to the next step.
>> 4. A potential delay which -does- happen which needs to fall back to
>> the flow and then fall out of it so that the delay can happen in the
>> event queue.
>> 5. The flow being continued because whatever the CPU was waiting for
>> has happened.
>>
>> As I said, the way that this works now is that each step calls the
>> next if it should happen immediately, and otherwise the callback after
>> the delay starts things up again. That has the nice property of
>> localizing a lot of things to the point where they're relevant, like
>> checking for interrupts, and that the different pieces can be started
>> whenever is convenient. I've talked about the problems at length.
>>
>> Instead of having every part call the following parts, what I'd like
>> to do instead is have a function which can be stopped and started will
>> and which calls all the component operations as child peers.
>> Unfortunately, it's been really difficult coming up with something
>> that can efficiently do and which provides an efficient mechanism for
>> all the possible forms of control flow I listed above.
>>
>> One idea I had was to set up a switch statement where each phase of
>> the execution flow was a case. Cases would not have breaks between
>> them so that if execution should continue it would flow right into the
>> next. The individual phases could be skipped to directly to allow
>> restarting things after some sort of delay.
>>
>> There are three major problems with this approach though. First, the
>> execution flow as shown is not linear, so it can't be implemented
>> directly as a single chain of events with no control flow. Second, it
>> pulls decisions away from where they'd be made locally, ie checking
>> whether to actually do a fetch for whatever reason from where the
>> fetch would start. Third, it provides no easy way to stop in the
>> middle of things to handle a fault without constantly checking if
>> there's one to deal with.
>>
>> In order to allow faults I was thinking of some sort of try/catch
>> mechanism, but that just seems ugly.
>>
>> The point of all this is, I think the way the CPU is build is broken
>> as a result of significant feature creep. I think conceptually my way
>> is better, but I'm having a hard time figuring out how to actually
>> implement it without it turning into a big ugly mess. If anybody has a
>> suggestion for how to make this work, please let me know.
>>
>> Gabe
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>>     
>
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to