Steve Reinhardt wrote: > So I've always felt that for x86 we should move to having the raw > instruction bytes and a length field in the StaticInst so that we can > check for decode page cache hits without going through the predecoder. > Wouldn't this solve both of your problems, since in the hit case you > would neither call the predecoder to generate an ExtMachInst nor need > to compare ExtMachInsts to see if you have a hit? > > I agree this requires some different handling of decode context info, > since you'll need to compare that along with the raw bytes. I didn't > completely follow your earlier argument about having multiple decoders > and why that's better than what we have now, but maybe we should get > back into that. > > Steve > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev >
One complication here are instructions that cross (pick unit) boundaries. One of the functions of the predecoder is to collect these instructions as you go along and aggregate them into one blob of bytes. X86 instructions can theoretically be up to 15 bytes, but with some of the new encodings they're using even that might have to be bumped up. That's not a fundamental problem, but it would mean that we'd have to have some extra mechanism to do the aggregation. I was actually thinking we'd need something like that for a predecoder cache too, and then something to feed the possibly partially collected blob of bytes into the predecoder when there's a miss. A nice property of putting that in the predecoder is that it hides the special handling from all the other parts of the CPU, etc. This sort of mechanism would be purely overhead in ISAs like SPARC and Alpha, but it would just go away with the change in predecoder. Also, the predecoder would know what contextualizing state was in effect, so it could manage that part of things as well. One downside of putting it in the predecoder, which you've brought up, is that that means there are two levels of caching and one level just feeds a lookup in the next. Most of the time there would be a hit, I'd imagine, and it would be nice to short circuit that and just go directly to the StaticInst. I don't see a good way to capture both of these benefits, and choosing between the performance boost or the cleaner/simpler/more compartmentalized implementation I'd go with the later. If you -do- see a way, please let me know. The idea behind having multiple decoders is that rather than have one decoder that decides every time what mode you're in, there would be multiple decoders one for each mode. When the control state changed that dictated a different decoder be used, then the other one would be switched in. Similarly for predecoders information like the default size of various registers, etc., could be baked in and a variant selected ahead of time based on the current mode. In both cases the individual decoder/predecoder could maintain its own separate cache or selectively share when possible and not worry about the contextualizing info as much. Basically this maintains the one to one mapping into a particular pool of instructions but drops it globally. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
