(Getting this back on the list again...) On Tue, Aug 25, 2009 at 10:32 PM, Gabe Black<[email protected]> wrote: > Steve Reinhardt wrote: >> On Tue, Aug 25, 2009 at 9:46 PM, Gabe Black<[email protected]> wrote: >> >>> Steve Reinhardt wrote: >>> >>>> On Tue, Aug 25, 2009 at 6:56 PM, Gabriel Michael >>>> Black<[email protected]> wrote: >>>> >>>> >>>>> That actually gives rise to one of the potential optimizations I mentioned >>>>> before. If some of the work of getting from bytes to StaticInsts can be >>>>> delayed until after the ExtMachInst conversion, for instance until the >>>>> ExtMachInst is used to construct the EmulEnv object or even in the microop >>>>> constructors, it would only happen if the decode cache missed and >>>>> potentially contribute less to the overall run time. I looked at it again >>>>> recently and nothing like that jumped out, but it might be there if >>>>> someone >>>>> looked hard enough. A tricky option would be figuring out how much >>>>> immediate >>>>> and/or displacement to read in with less work since that's based on a lot >>>>> of >>>>> different factors. >>>>> >>>>> >>>> What about the instruction page cache? I thought our summer intern >>>> from a few years back added a shadow-page-like struct that cached the >>>> StaticInst objects for a page according to PC. For x86 you'd have to >>>> make this byte-oriented rather than word-oriented, but the nice thing >>>> is that, assuming you're also keeping the original byte sequence along >>>> with the ExtMachInst, all you have to check is that the byte sequence >>>> matches what's in the actual instruction page. >>>> >>>> >>> I think what happens is that it uses the PC and compares the >>> ExtMachInsts generated this time and the time it was cached. You'd have >>> to do that since you wouldn't want, for instance, the PAL version of one >>> instruction to be returned when trying to decode the non PAL version, >>> even if the actual bytes in memory are the same. I think the general >>> rule is that the ExtMachInst must be different if the end StaticInst is >>> different, and since I followed that rule it all just works out even for >>> x86. >>> >> >> Right, I recall that now. Your original comment makes more sense: you >> really want the ExtMachInst to be just the original byte stream plus >> any necessary mode info (like PAL mode for Alpha), and not the >> half-decoded thing you have now. I think that's a great goal to keep >> in mind if we do dive in to a more thorough restructuring. >> >> Steve >> > > Yeah. Unfortunately figuring out how much immediate and/or displacement > to read in, something that usually partially determines the length of an > instruction, seems to require the partial decode. The instructions that > require either and the size they need seems almost random which is why I > have some look up tables in there. I think real CPUs approximate and > then make instructions fix up the PC if they know the quick answer is > wrong. If we find a way to get around that I think we might actually be > able to get it in there without any other changes. I was thinking before > that we could do the extra processing after an early cache lookup but > before decode, but that's not really necessary since there are already > steps inside the decoder that could handle some of it.
As far as the cache lookup, if you've already got a cached instruction then that will tell you how many bytes you need to look at to validate the cached object. The only hiccup I see is that if the cache lookup fails, you may need to iteratively build the ExtMachInst as you decode; I don't know if that's different than what happens now or not. Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
