Quoting Steve Reinhardt <[email protected]>:
> On Sun, May 3, 2009 at 12:09 AM, Gabe Black <[email protected]> wrote:
>
>> While this does avoid the segfault, it also causes some other bug which
>> crashes just about any of the simple timing regressions. I hadn't
>> actually tried any of the quick regressions when I sent that out since
>> my other testing had tricked me into thinking everything was fine. I
>> think it has something to do with faulting accesses not dealing with the
>> fault right away and instead continuing into the remainder of
>> completeIfetch. Rather than try to bandaid this into working, I'm
>> thinking I'll just going to go for it and try to see what reorganizing
>> the code buys me.
>>
>
> It seems like anything that uses the timing-mode translation would have to
> be prepared to not know whether a translation succeeds or not until a later
> event is scheduled.... are you saying that this change exposes a fundamental
> problem in the structure of the simple timing cpu with regard to how it
> deals with timing-mode translation? That's what it sounds like to me, but I
> just wanted to clarify.
>
> Thanks,
>
> Steve
>
Fundemental is probably too strong a word. Ingrained is probably
better. The simple timing CPU is now pretty different from how it
started life and I think that shows a bit. It's been split off of
atomic, has delayable translation, microcode, unaligned accesses,
variable instruction sizes, memory mapped registers, and there may be
other things I'm forgetting. Those things have been folded in and are
working, but I think a lot of the complexity is that the code wasn't
designed to accomadate them originally.
This is actually a good opportunity to discuss how the timing CPU is
put together and what I'd like to do with it. To start, this is what
the lifetime of an instruction looks like. Points where the flow may
be delayed using the event queue are marked with "|". Points where the
flow may be halted by a fault are marked with a "*". This will
probably also look like garbage without fixed width fonts.
(PreInst)-*->(Fetch)/--------------------------->\/-(Exec)
\--->(ITLB)-*|->(I cache)-|->/\-(initiateAcc)--->
(DTLB)-*|->(D cache)-|->(completeAcc)
The problem we started with is from initiateAcc going directly into
the DTLB portion without finish. Generally, we can run into problems
where we can go through this process avoiding all the "|"s or by
coincidence not delaying on them and get farther and farther ahead of
ourselves and/or build up a deeper and deeper pile of cruft on the
stack. If a macroop is implemented, for some reason, to loop around
and around inside itself waiting for, for instance, an interrupt to
happen, all "|"s would be skipped and the call stack would build until
it overflowed. What I would like to do, then, is structure the code so
that calls never venture too far from their origin and return home
before starting the next task.
To get there, there are several types of control flow to consider.
1. The end of the instruction where control loops back to PreInst
(which checks for interrupts and pc related events)
2. A fault which is invoked and returns to PreInst.
3. A potential delay which doesn't happen which needs to fall back to
the flow so that it can continue to the next step.
4. A potential delay which -does- happen which needs to fall back to
the flow and then fall out of it so that the delay can happen in the
event queue.
5. The flow being continued because whatever the CPU was waiting for
has happened.
As I said, the way that this works now is that each step calls the
next if it should happen immediately, and otherwise the callback after
the delay starts things up again. That has the nice property of
localizing a lot of things to the point where they're relevant, like
checking for interrupts, and that the different pieces can be started
whenever is convenient. I've talked about the problems at length.
Instead of having every part call the following parts, what I'd like
to do instead is have a function which can be stopped and started will
and which calls all the component operations as child peers.
Unfortunately, it's been really difficult coming up with something
that can efficiently do and which provides an efficient mechanism for
all the possible forms of control flow I listed above.
One idea I had was to set up a switch statement where each phase of
the execution flow was a case. Cases would not have breaks between
them so that if execution should continue it would flow right into the
next. The individual phases could be skipped to directly to allow
restarting things after some sort of delay.
There are three major problems with this approach though. First, the
execution flow as shown is not linear, so it can't be implemented
directly as a single chain of events with no control flow. Second, it
pulls decisions away from where they'd be made locally, ie checking
whether to actually do a fetch for whatever reason from where the
fetch would start. Third, it provides no easy way to stop in the
middle of things to handle a fault without constantly checking if
there's one to deal with.
In order to allow faults I was thinking of some sort of try/catch
mechanism, but that just seems ugly.
The point of all this is, I think the way the CPU is build is broken
as a result of significant feature creep. I think conceptually my way
is better, but I'm having a hard time figuring out how to actually
implement it without it turning into a big ugly mess. If anybody has a
suggestion for how to make this work, please let me know.
Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev