Re: [m5-dev] cleaning up TimingSimpleCPU

Gabe Black Tue, 13 Jul 2010 11:43:51 -0700

Gabe Black wrote:
> Steve Reinhardt wrote:
>   
>> All the discussion of different extensions to TimingSimpleCPU got me
>> thinking again about what a mess it is.  I walked through the code
>> with Brad & Joel a few weeks ago, and it's still the same basic
>> structure of everything being driven by callbacks, with numerous cases
>> where we call the next callback directly because some stage is getting
>> bypassed.  That was confusing enough already, but now we have about
>> twice as many of these situations, and several different ways of
>> implementing them (some callbacks come via ports, then there's
>> WholeTranslationState::finish() which uses a virtual function override
>> (that's redirected in simple/timing.hh just to keep you on your toes),
>> then there's DataTranslation which derives from WholeTranslationState
>> and catches the finish() method and redirects it to
>> finishTranslation() using a template...).
>>
>> I'm not sure there's a good solution to the
>> sometimes-bypassed-chained-callbacks structure (it seems inherent in
>> the way it needs to work) other than good documentation.  But if we
>> regularize how those callbacks are handled that would help a lot. One
>> way to do this is to pass translation requests to the TLBs via ports
>> (e.g., dtb->sendTiming(rqst)).  Then everything would be
>> message-driven, and all the callbacks would come through different
>> ports.  Once you understand how ports work then you could figure it
>> out yourself.
>>
>> A second step that's somewhat independent but still seems nicely
>> complementary is to push all the unaligned access ugliness out of the
>> CPU.  The basic steps wouldn't change much, but the complexity would
>> be hidden from the CPU, and could be omitted for ISAs that don't have
>> to deal with it.  The cleanest way would be to create a shim object
>> that takes a potentially unaligned request from the CPU and does the
>> split/recombine if it is a line/page crosser but just forwards it
>> otherwise.  I think we'd definitely want to go this way for the
>> caches, since we don't really want to push the complexity into the
>> cache either, but I could see skipping the shim and just embedding the
>> logic in the TLB for the ISAs that need it, since the TLB is already
>> ISA-specific (though we'd still want to use a common mechanism like
>> the WholeTranslationState thing).
>>
>> This mechanism could then work for all the CPU models... was there a
>> reason we didn't do it this way in the first place?  If we thought it
>> would be too much overhead, I say forget it, at this point I'm willing
>> to pay a little runtime overhead to clean up this code.  And I'm not
>> sure it would be any more overhead than what we already have anyway.
>>
>> Thoughts?  Volunteers?  :-)
>>
>> Steve
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>   
>>     
>
> I can't say it was -the- reason, but one reason is that the TLBs as is
> don't actually send the packets for the CPU, so they can't split
> anything into multiple transactions easily. I'm intrigued by the idea of
> putting the TLB behind a port or port like interface, maybe even
> exporting the TLB outside of the CPU's guts and putting it inline with
> external accesses. There are three problems with that, though. First,
> the TLB would likely need some alternative way to pass a fault back to
> the CPU. Maybe the request would have a fault pointer field? Second, the
> TLB is the thing that recognizes when an access is to memory mapped
> control state within the CPU. It would need a way to communicate with
> the CPU to get/set those values. Third, the control state that actually
> -runs- the TLB is maintained by the CPU, namely what mode it's in, etc.
>
> This also brings up another idea I've been rolling around for a while.
> Why is all the control state local to the miscregfile/it's decendant the
> ISA object? Why don't we put control state that matters to the TLB, or
> at least a copy of it, in the TLB itself and then communicate it back
> and forth as necessary? That would be easier to code (or at least I'm
> guessing) since you'd just have the state right there, faster since it
> avoids calling out for it, and would more conceptually match real
> hardware where all the control state isn't put in one huge blob
> someplace. The same thing could be done for other structures like the
> interrupt controller, and maybe the decoder and/or predecoder. Speaking
> of the decoder, it would be nice to make that a little stateful as well.
> As it is in, say, ARM, the decoder has to rediscover what mode it's in
> over and over. I'm guessing it would be better to explicitly switch it's
> state (or it entirely) when changing modes instead, although that might
> add a fair amount of complexity. Perhaps the decoder should be an object
> instead of a bare function? I'm less sure how that would work. It could,
> hypothetically, allow us to return the two PC bits commandeered to
> signal the mode.
>
> Gabe
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>


There's at least two reasons not to go directly from the TLB to memory.
Store to load forwarding, and translation as a step separate from
execution to, say, translate ahead of time but not actually send out a
store until it's committed. I do still like spreading out the control
state, though.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] cleaning up TimingSimpleCPU

Reply via email to