Quoting Steve Reinhardt <[email protected]>: > On Tue, Aug 25, 2009 at 6:30 PM, Gabriel Michael > Black<[email protected]> wrote: >>> >>> Actually I was hoping that you wouldn't have to include all the .hh >>> files. If the main decoder in main_decoder.cc calls out to a >>> subdecoder for x87 ops in x87_decoder.cc, then a header that declares >>> the instruction objects for x87 instructions would only need to be >>> included in the latter .cc file. >> >> That's true, but then we may have function call overhead at those points >> since I don't know if gcc does cross object module inlining. If the header >> files prove to be a major contributor then that may still be worth it. > > Yes, you'd pay for the function call, but that shouldn't be too > noticeable, particularly if we cache decoded instructions. (Does x86 > use the decode cache or did you have to disable it b/c it didn't deal > with variable length instructions?) >
The decode cache is being used. The cache is keyed on ExtMachInsts, and x86 translates the stream of instruction bytes into those before they hit the decoder. X86 defines those as a structure that holds all the relevant information from the bytes but in a uniform way. That actually gives rise to one of the potential optimizations I mentioned before. If some of the work of getting from bytes to StaticInsts can be delayed until after the ExtMachInst conversion, for instance until the ExtMachInst is used to construct the EmulEnv object or even in the microop constructors, it would only happen if the decode cache missed and potentially contribute less to the overall run time. I looked at it again recently and nothing like that jumped out, but it might be there if someone looked hard enough. A tricky option would be figuring out how much immediate and/or displacement to read in with less work since that's based on a lot of different factors. > Probably the easiest way would just be to take a current decoder.cc, > hack it up manually to match one of the thigns we're proposing, then > invoke gcc manually on the result and time it. (Not necessarily easy > in an absolute sense, but it's just a one-off try so there's no point > in doing anything more automated IMO.) So are you volunteering to split up the, acording to wc -l, ~110,000 line file? :-) That'll be quite a task. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
