Quoting Steve Reinhardt <[email protected]>:

> On Tue, Aug 25, 2009 at 6:30 PM, Gabriel Michael
> Black<[email protected]> wrote:
>>>
>>> Actually I was hoping that you wouldn't have to include all the .hh
>>> files.  If the main decoder in main_decoder.cc calls out to a
>>> subdecoder for x87 ops in x87_decoder.cc, then a header that declares
>>> the instruction objects for x87 instructions would only need to be
>>> included in the latter .cc file.
>>
>> That's true, but then we may have function call overhead at those points
>> since I don't know if gcc does cross object module inlining. If the header
>> files prove to be a major contributor then that may still be worth it.
>
> Yes, you'd pay for the function call, but that shouldn't be too
> noticeable, particularly if we cache decoded instructions.  (Does x86
> use the decode cache or did you have to disable it b/c it didn't deal
> with variable length instructions?)
>

The decode cache is being used. The cache is keyed on ExtMachInsts,  
and x86 translates the stream of instruction bytes into those before  
they hit the decoder. X86 defines those as a structure that holds all  
the relevant information from the bytes but in a uniform way.

That actually gives rise to one of the potential optimizations I  
mentioned before. If some of the work of getting from bytes to  
StaticInsts can be delayed until after the ExtMachInst conversion, for  
instance until the ExtMachInst is used to construct the EmulEnv object  
or even in the microop constructors, it would only happen if the  
decode cache missed and potentially contribute less to the overall run  
time. I looked at it again recently and nothing like that jumped out,  
but it might be there if someone looked hard enough. A tricky option  
would be figuring out how much immediate and/or displacement to read  
in with less work since that's based on a lot of different factors.

> Probably the easiest way would just be to take a current decoder.cc,
> hack it up manually to match one of the thigns we're proposing, then
> invoke gcc manually on the result and time it.  (Not necessarily easy
> in an absolute sense, but it's just a one-off try so there's no point
> in doing anything more automated IMO.)

So are you volunteering to split up the, acording to wc -l, ~110,000  
line file? :-) That'll be quite a task.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to