On Tue, Aug 25, 2009 at 6:56 PM, Gabriel Michael
Black<[email protected]> wrote:
>
> The decode cache is being used. The cache is keyed on ExtMachInsts, and x86
> translates the stream of instruction bytes into those before they hit the
> decoder. X86 defines those as a structure that holds all the relevant
> information from the bytes but in a uniform way.
>
> That actually gives rise to one of the potential optimizations I mentioned
> before. If some of the work of getting from bytes to StaticInsts can be
> delayed until after the ExtMachInst conversion, for instance until the
> ExtMachInst is used to construct the EmulEnv object or even in the microop
> constructors, it would only happen if the decode cache missed and
> potentially contribute less to the overall run time. I looked at it again
> recently and nothing like that jumped out, but it might be there if someone
> looked hard enough. A tricky option would be figuring out how much immediate
> and/or displacement to read in with less work since that's based on a lot of
> different factors.

What about the instruction page cache?  I thought our summer intern
from a few years back added a shadow-page-like struct that cached the
StaticInst objects for a page according to PC.  For x86 you'd have to
make this byte-oriented rather than word-oriented, but the nice thing
is that, assuming you're also keeping the original byte sequence along
with the ExtMachInst, all you have to check is that the byte sequence
matches what's in the actual instruction page.

>
>> Probably the easiest way would just be to take a current decoder.cc,
>> hack it up manually to match one of the thigns we're proposing, then
>> invoke gcc manually on the result and time it.  (Not necessarily easy
>> in an absolute sense, but it's just a one-off try so there's no point
>> in doing anything more automated IMO.)
>
> So are you volunteering to split up the, acording to wc -l, ~110,000 line
> file? :-) That'll be quite a task.

I see no mention of specific individuals in my comment!  A lot depends
on whether you can hack out a few contiguous 30,000-line chunks or if
you'd have to do a lot of interleaving a few lines at a time to get it
to work.  Even in the latter case, some emacs macro creativity could
possibly go a long way.  I don't object to giving it a shot myself,
but it won't be soon.

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to