Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Gabriel Michael Black Wed, 06 May 2009 11:42:44 -0700

There is the original problem of reads and writes happening in place  
and hence getting things out of order, but there's also a problem with  
the call stack getting arbitrarily deep if you manage to continuously  
avoid anything that has a delay. The example I mentioned would be if  
you have a microcode loop that doesn't touch memory to, for instance,  
stall until you get an interrupt or a countdown expires for a small  
delay. These are really the two issues I'm most concerned about. The  
rest of the stuff in there seems to be working alright even though  
it's hard to follow. If things need to be adjusted to deal with the  
other issues and because it's hard to follow, it would be nice to  
reorganize all of it to try to make it easier to understand.


Gabe

Quoting Korey Sewell <[email protected]>:

> Depending on the issue and complexity of that issue, you might expect
> the turnaround time on a email to vary in length.
>
> I actually had a draft response, but was confused and didnt want to
> get "ranted on" for not knowing what I was talking about so I held
> back.
>
> Here is a draft of what I was going to write...again *draft*...
> "I'm not necessarily convinced that the TimingSimpleCPU (TSC) needs a
> re-write just yet...
>
> If we all agree that the TSC should *ideally* follow the simplest
> chain of events possible to support a timing mode memory, then
> whatever changes are made would hopefully accomodate that scheme or
> hopefully take code that went off that path back to that "simple"
> path.
>
> For instance, Gabe mentioned that currently "each step calls the
>> next if it should happen immediately, and otherwise the callback after
>> the delay starts things up again."
>
> I kind of like that model, but it seems that things get broken when
> dependent callback events gets stacked on top of each other (I *think*
> that's the major issue that Gabe's trying to tackle here) since
> introducing timing for TLBs and other aspects were previously assumed
> to happen atomicly.
>
> Would it make sense to
> (1) enforce a convention for dependent callback events such that no
> two events gets stacked behind each other. Doing this would preserve
> the simple control flow and also allow for the timing to be modeled as
> well...
> (2) create states for events such that one dependent event will be
> able to be queried for if it's in progress or finished. This would
> allow the functions to skip the scheduling of an event that is already
> called and not schedule a dependent event simultaneously."
>
> On Wed, May 6, 2009 at 2:38 AM, Gabe Black <[email protected]> wrote:
>> Anybody?
>>
>> Gabe Black wrote:
>>> If nobody has anything to say about the issue itself, letting me know
>>> which part of my ramblings is the least comprehensible would also be
>>> helpful.
>>>
>>> Gabe
>>>
>>> Gabriel Michael Black wrote:
>>>
>>>> My little diagram was missing a few "*"s. Here's a corrected version.
>>>> The "*"s after Exec and completeAcc are for faults that would happen
>>>> on the way back to PreInst.
>>>>
>>>> Gabe
>>>>
>>>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)*
>>>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)-*->
>>>>
>>>> (DTLB)-*|->(D cache)-|->(completeAcc)*
>>>>
>>>>
>>>> Quoting Gabriel Michael Black <[email protected]>:
>>>>
>>>>
>>>>
>>>>> Quoting Steve Reinhardt <[email protected]>:
>>>>>
>>>>>
>>>>>
>>>>>> On Sun, May 3, 2009 at 12:09 AM, Gabe Black  
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> While this does avoid the segfault, it also causes some other bug which
>>>>>>> crashes just about any of the simple timing regressions. I hadn't
>>>>>>> actually tried any of the quick regressions when I sent that out since
>>>>>>> my other testing had tricked me into thinking everything was fine. I
>>>>>>> think it has something to do with faulting accesses not  
>>>>>>> dealing with the
>>>>>>> fault right away and instead continuing into the remainder of
>>>>>>> completeIfetch. Rather than try to bandaid this into working, I'm
>>>>>>> thinking I'll just going to go for it and try to see what reorganizing
>>>>>>> the code buys me.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> It seems like anything that uses the timing-mode translation  
>>>>>> would have to
>>>>>> be prepared to not know whether a translation succeeds or not  
>>>>>> until a later
>>>>>> event is scheduled.... are you saying that this change exposes  
>>>>>> a fundamental
>>>>>> problem in the structure of the simple timing cpu with regard to how it
>>>>>> deals with timing-mode translation?  That's what it sounds like  
>>>>>> to me, but I
>>>>>> just wanted to clarify.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>
>>>>>>
>>>>> Fundemental is probably too strong a word. Ingrained is probably
>>>>> better. The simple timing CPU is now pretty different from how it
>>>>> started life and I think that shows a bit. It's been split off of
>>>>> atomic, has delayable translation, microcode, unaligned accesses,
>>>>> variable instruction sizes, memory mapped registers, and there may be
>>>>> other things I'm forgetting. Those things have been folded in and are
>>>>> working, but I think a lot of the complexity is that the code wasn't
>>>>> designed to accomadate them originally.
>>>>>
>>>>> This is actually a good opportunity to discuss how the timing CPU is
>>>>> put together and what I'd like to do with it. To start, this is what
>>>>> the lifetime of an instruction looks like. Points where the flow may
>>>>> be delayed using the event queue are marked with "|". Points where the
>>>>> flow may be halted by a fault are marked with a "*". This will
>>>>> probably also look like garbage without fixed width fonts.
>>>>>
>>>>> (PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
>>>>>                      \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
>>>>>
>>>>> (DTLB)-*|->(D cache)-|->(completeAcc)
>>>>>
>>>>> The problem we started with is from initiateAcc going directly into
>>>>> the DTLB portion without finish. Generally, we can run into problems
>>>>> where we can go through this process avoiding all the "|"s or by
>>>>> coincidence not delaying on them and get farther and farther ahead of
>>>>> ourselves and/or build up a deeper and deeper pile of cruft on the
>>>>> stack. If a macroop is implemented, for some reason, to loop around
>>>>> and around inside itself waiting for, for instance, an interrupt to
>>>>> happen, all "|"s would be skipped and the call stack would build until
>>>>> it overflowed. What I would like to do, then, is structure the code so
>>>>> that calls never venture too far from their origin and return home
>>>>> before starting the next task.
>>>>>
>>>>> To get there, there are several types of control flow to consider.
>>>>> 1. The end of the instruction where control loops back to PreInst
>>>>> (which checks for interrupts and pc related events)
>>>>> 2. A fault which is invoked and returns to PreInst.
>>>>> 3. A potential delay which doesn't happen which needs to fall back to
>>>>> the flow so that it can continue to the next step.
>>>>> 4. A potential delay which -does- happen which needs to fall back to
>>>>> the flow and then fall out of it so that the delay can happen in the
>>>>> event queue.
>>>>> 5. The flow being continued because whatever the CPU was waiting for
>>>>> has happened.
>>>>>
>>>>> As I said, the way that this works now is that each step calls the
>>>>> next if it should happen immediately, and otherwise the callback after
>>>>> the delay starts things up again. That has the nice property of
>>>>> localizing a lot of things to the point where they're relevant, like
>>>>> checking for interrupts, and that the different pieces can be started
>>>>> whenever is convenient. I've talked about the problems at length.
>>>>>
>>>>> Instead of having every part call the following parts, what I'd like
>>>>> to do instead is have a function which can be stopped and started will
>>>>> and which calls all the component operations as child peers.
>>>>> Unfortunately, it's been really difficult coming up with something
>>>>> that can efficiently do and which provides an efficient mechanism for
>>>>> all the possible forms of control flow I listed above.
>>>>>
>>>>> One idea I had was to set up a switch statement where each phase of
>>>>> the execution flow was a case. Cases would not have breaks between
>>>>> them so that if execution should continue it would flow right into the
>>>>> next. The individual phases could be skipped to directly to allow
>>>>> restarting things after some sort of delay.
>>>>>
>>>>> There are three major problems with this approach though. First, the
>>>>> execution flow as shown is not linear, so it can't be implemented
>>>>> directly as a single chain of events with no control flow. Second, it
>>>>> pulls decisions away from where they'd be made locally, ie checking
>>>>> whether to actually do a fetch for whatever reason from where the
>>>>> fetch would start. Third, it provides no easy way to stop in the
>>>>> middle of things to handle a fault without constantly checking if
>>>>> there's one to deal with.
>>>>>
>>>>> In order to allow faults I was thinking of some sort of try/catch
>>>>> mechanism, but that just seems ugly.
>>>>>
>>>>> The point of all this is, I think the way the CPU is build is broken
>>>>> as a result of significant feature creep. I think conceptually my way
>>>>> is better, but I'm having a hard time figuring out how to actually
>>>>> implement it without it turning into a big ugly mess. If anybody has a
>>>>> suggestion for how to make this work, please let me know.
>>>>>
>>>>> Gabe
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
>
>
> --
> ===========
> Korey L Sewell
> PhD Candidate
> Computer Science & Engineering
> University of Michigan
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>


_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH] CPU: Defer completing an access until we're no longer running out of initiateAcc

Reply via email to