Quoting Steve Reinhardt <[email protected]>:

> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote:
>
>> While this does avoid the segfault, it also causes some other bug which
>> crashes just about any of the simple timing regressions. I hadn't
>> actually tried any of the quick regressions when I sent that out since
>> my other testing had tricked me into thinking everything was fine. I
>> think it has something to do with faulting accesses not dealing with the
>> fault right away and instead continuing into the remainder of
>> completeIfetch. Rather than try to bandaid this into working, I'm
>> thinking I'll just going to go for it and try to see what reorganizing
>> the code buys me.
>>
>
> It seems like anything that uses the timing-mode translation would have to
> be prepared to not know whether a translation succeeds or not until a later
> event is scheduled.... are you saying that this change exposes a fundamental
> problem in the structure of the simple timing cpu with regard to how it
> deals with timing-mode translation?  That's what it sounds like to me, but I
> just wanted to clarify.
>
> Thanks,
>
> Steve
>

Fundemental is probably too strong a word. Ingrained is probably  
better. The simple timing CPU is now pretty different from how it  
started life and I think that shows a bit. It's been split off of  
atomic, has delayable translation, microcode, unaligned accesses,  
variable instruction sizes, memory mapped registers, and there may be  
other things I'm forgetting. Those things have been folded in and are  
working, but I think a lot of the complexity is that the code wasn't  
designed to accomadate them originally.

This is actually a good opportunity to discuss how the timing CPU is  
put together and what I'd like to do with it. To start, this is what  
the lifetime of an instruction looks like. Points where the flow may  
be delayed using the event queue are marked with "|". Points where the  
flow may be halted by a fault are marked with a "*". This will  
probably also look like garbage without fixed width fonts.

(PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
                     \--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->

(DTLB)-*|->(D cache)-|->(completeAcc)

The problem we started with is from initiateAcc going directly into  
the DTLB portion without finish. Generally, we can run into problems  
where we can go through this process avoiding all the "|"s or by  
coincidence not delaying on them and get farther and farther ahead of  
ourselves and/or build up a deeper and deeper pile of cruft on the  
stack. If a macroop is implemented, for some reason, to loop around  
and around inside itself waiting for, for instance, an interrupt to  
happen, all "|"s would be skipped and the call stack would build until  
it overflowed. What I would like to do, then, is structure the code so  
that calls never venture too far from their origin and return home  
before starting the next task.

To get there, there are several types of control flow to consider.
1. The end of the instruction where control loops back to PreInst  
(which checks for interrupts and pc related events)
2. A fault which is invoked and returns to PreInst.
3. A potential delay which doesn't happen which needs to fall back to  
the flow so that it can continue to the next step.
4. A potential delay which -does- happen which needs to fall back to  
the flow and then fall out of it so that the delay can happen in the  
event queue.
5. The flow being continued because whatever the CPU was waiting for  
has happened.

As I said, the way that this works now is that each step calls the  
next if it should happen immediately, and otherwise the callback after  
the delay starts things up again. That has the nice property of  
localizing a lot of things to the point where they're relevant, like  
checking for interrupts, and that the different pieces can be started  
whenever is convenient. I've talked about the problems at length.

Instead of having every part call the following parts, what I'd like  
to do instead is have a function which can be stopped and started will  
and which calls all the component operations as child peers.  
Unfortunately, it's been really difficult coming up with something  
that can efficiently do and which provides an efficient mechanism for  
all the possible forms of control flow I listed above.

One idea I had was to set up a switch statement where each phase of  
the execution flow was a case. Cases would not have breaks between  
them so that if execution should continue it would flow right into the  
next. The individual phases could be skipped to directly to allow  
restarting things after some sort of delay.

There are three major problems with this approach though. First, the  
execution flow as shown is not linear, so it can't be implemented  
directly as a single chain of events with no control flow. Second, it  
pulls decisions away from where they'd be made locally, ie checking  
whether to actually do a fetch for whatever reason from where the  
fetch would start. Third, it provides no easy way to stop in the  
middle of things to handle a fault without constantly checking if  
there's one to deal with.

In order to allow faults I was thinking of some sort of try/catch  
mechanism, but that just seems ugly.

The point of all this is, I think the way the CPU is build is broken  
as a result of significant feature creep. I think conceptually my way  
is better, but I'm having a hard time figuring out how to actually  
implement it without it turning into a big ugly mess. If anybody has a  
suggestion for how to make this work, please let me know.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to