Gabe Black wrote: > Steve Reinhardt wrote: > >> All the discussion of different extensions to TimingSimpleCPU got me >> thinking again about what a mess it is. I walked through the code >> with Brad & Joel a few weeks ago, and it's still the same basic >> structure of everything being driven by callbacks, with numerous cases >> where we call the next callback directly because some stage is getting >> bypassed. That was confusing enough already, but now we have about >> twice as many of these situations, and several different ways of >> implementing them (some callbacks come via ports, then there's >> WholeTranslationState::finish() which uses a virtual function override >> (that's redirected in simple/timing.hh just to keep you on your toes), >> then there's DataTranslation which derives from WholeTranslationState >> and catches the finish() method and redirects it to >> finishTranslation() using a template...). >> >> I'm not sure there's a good solution to the >> sometimes-bypassed-chained-callbacks structure (it seems inherent in >> the way it needs to work) other than good documentation. But if we >> regularize how those callbacks are handled that would help a lot. One >> way to do this is to pass translation requests to the TLBs via ports >> (e.g., dtb->sendTiming(rqst)). Then everything would be >> message-driven, and all the callbacks would come through different >> ports. Once you understand how ports work then you could figure it >> out yourself. >> >> A second step that's somewhat independent but still seems nicely >> complementary is to push all the unaligned access ugliness out of the >> CPU. The basic steps wouldn't change much, but the complexity would >> be hidden from the CPU, and could be omitted for ISAs that don't have >> to deal with it. The cleanest way would be to create a shim object >> that takes a potentially unaligned request from the CPU and does the >> split/recombine if it is a line/page crosser but just forwards it >> otherwise. I think we'd definitely want to go this way for the >> caches, since we don't really want to push the complexity into the >> cache either, but I could see skipping the shim and just embedding the >> logic in the TLB for the ISAs that need it, since the TLB is already >> ISA-specific (though we'd still want to use a common mechanism like >> the WholeTranslationState thing). >> >> This mechanism could then work for all the CPU models... was there a >> reason we didn't do it this way in the first place? If we thought it >> would be too much overhead, I say forget it, at this point I'm willing >> to pay a little runtime overhead to clean up this code. And I'm not >> sure it would be any more overhead than what we already have anyway. >> >> Thoughts? Volunteers? :-) >> >> Steve >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> >> > > I can't say it was -the- reason, but one reason is that the TLBs as is > don't actually send the packets for the CPU, so they can't split > anything into multiple transactions easily. I'm intrigued by the idea of > putting the TLB behind a port or port like interface, maybe even > exporting the TLB outside of the CPU's guts and putting it inline with > external accesses. There are three problems with that, though. First, > the TLB would likely need some alternative way to pass a fault back to > the CPU. Maybe the request would have a fault pointer field? Second, the > TLB is the thing that recognizes when an access is to memory mapped > control state within the CPU. It would need a way to communicate with > the CPU to get/set those values. Third, the control state that actually > -runs- the TLB is maintained by the CPU, namely what mode it's in, etc. > > This also brings up another idea I've been rolling around for a while. > Why is all the control state local to the miscregfile/it's decendant the > ISA object? Why don't we put control state that matters to the TLB, or > at least a copy of it, in the TLB itself and then communicate it back > and forth as necessary? That would be easier to code (or at least I'm > guessing) since you'd just have the state right there, faster since it > avoids calling out for it, and would more conceptually match real > hardware where all the control state isn't put in one huge blob > someplace. The same thing could be done for other structures like the > interrupt controller, and maybe the decoder and/or predecoder. Speaking > of the decoder, it would be nice to make that a little stateful as well. > As it is in, say, ARM, the decoder has to rediscover what mode it's in > over and over. I'm guessing it would be better to explicitly switch it's > state (or it entirely) when changing modes instead, although that might > add a fair amount of complexity. Perhaps the decoder should be an object > instead of a bare function? I'm less sure how that would work. It could, > hypothetically, allow us to return the two PC bits commandeered to > signal the mode. > > Gabe > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
There's at least two reasons not to go directly from the TLB to memory. Store to load forwarding, and translation as a step separate from execution to, say, translate ahead of time but not actually send out a store until it's committed. I do still like spreading out the control state, though. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
