(Getting this back on the list again...)

On Tue, Aug 25, 2009 at 10:32 PM, Gabe Black<[email protected]> wrote:
> Steve Reinhardt wrote:
>> On Tue, Aug 25, 2009 at 9:46 PM, Gabe Black<[email protected]> wrote:
>>
>>> Steve Reinhardt wrote:
>>>
>>>> On Tue, Aug 25, 2009 at 6:56 PM, Gabriel Michael
>>>> Black<[email protected]> wrote:
>>>>
>>>>
>>>>> That actually gives rise to one of the potential optimizations I mentioned
>>>>> before. If some of the work of getting from bytes to StaticInsts can be
>>>>> delayed until after the ExtMachInst conversion, for instance until the
>>>>> ExtMachInst is used to construct the EmulEnv object or even in the microop
>>>>> constructors, it would only happen if the decode cache missed and
>>>>> potentially contribute less to the overall run time. I looked at it again
>>>>> recently and nothing like that jumped out, but it might be there if 
>>>>> someone
>>>>> looked hard enough. A tricky option would be figuring out how much 
>>>>> immediate
>>>>> and/or displacement to read in with less work since that's based on a lot 
>>>>> of
>>>>> different factors.
>>>>>
>>>>>
>>>> What about the instruction page cache?  I thought our summer intern
>>>> from a few years back added a shadow-page-like struct that cached the
>>>> StaticInst objects for a page according to PC.  For x86 you'd have to
>>>> make this byte-oriented rather than word-oriented, but the nice thing
>>>> is that, assuming you're also keeping the original byte sequence along
>>>> with the ExtMachInst, all you have to check is that the byte sequence
>>>> matches what's in the actual instruction page.
>>>>
>>>>
>>> I think what happens is that it uses the PC and compares the
>>> ExtMachInsts generated this time and the time it was cached. You'd have
>>> to do that since you wouldn't want, for instance, the PAL version of one
>>> instruction to be returned when trying to decode the non PAL version,
>>> even if the actual bytes in memory are the same. I think the general
>>> rule is that the ExtMachInst must be different if the end StaticInst is
>>> different, and since I followed that rule it all just works out even for
>>> x86.
>>>
>>
>> Right, I recall that now.  Your original comment makes more sense: you
>> really want the ExtMachInst to be just the original byte stream plus
>> any necessary mode info (like PAL mode for Alpha), and not the
>> half-decoded thing you have now.  I think that's a great goal to keep
>> in mind if we do dive in to a more thorough restructuring.
>>
>> Steve
>>
>
> Yeah. Unfortunately figuring out how much immediate and/or displacement
> to read in, something that usually partially determines the length of an
> instruction, seems to require the partial decode. The instructions that
> require either and the size they need seems almost random which is why I
> have some look up tables in there. I think real CPUs approximate and
> then make instructions fix up the PC if they know the quick answer is
> wrong. If we find a way to get around that I think we might actually be
> able to get it in there without any other changes. I was thinking before
> that we could do the extra processing after an early cache lookup but
> before decode, but that's not really necessary since there are already
> steps inside the decoder that could handle some of it.

As far as the cache lookup, if you've already got a cached instruction
then that will tell you how many bytes you need to look at to validate
the cached object.  The only hiccup I see is that if the cache lookup
fails, you may need to iteratively build the ExtMachInst as you
decode; I don't know if that's different than what happens now or not.

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to