Re: [gem5-dev] Non-ISA specific use of ExtMachInst

Andreas Sandberg Thu, 25 Jan 2018 04:40:34 -0800


On 23/01/2018 23:10, Gabe Black wrote:
Trimmed, responses inline.


On Tue, Jan 23, 2018 at 2:23 AM, Andreas Sandberg 
<andreas.sandb...@arm.com<mailto:andreas.sandb...@arm.com>> wrote:
On 22/01/2018 23:53, Gabe Black wrote:

It isn't really undermining the idea of having multiple ISAs in the same binary. The 
biggest "problem" is that you won't be able to use a detailed timing model for 
one ISA with another ISA. That's probably fine though. I'm pretty sure it would be hard 
to make a timing model that is realistic for both Arm and x86 given that we most likely 
classify instructions in slightly different ways.

It's problematic to have features which fundamentally don't work for a particular 
ISA, although that's not what I'm talking about. I'm talking about the #ifdef which 
is based on the arch since the X86 ExtMachInst can't be &-ed against and won't 
compile that way.


A simple solution would be to just add create a instAsUint32 method in the static 
instruction interface and stub that on x86. The code would "just work" in that 
case, but it would be a bit ugly.


The problem is that different microarchitectures would need different 
classifications (i.e., different uarchs route the same instruction to 
differently) and some instructions don't have fixed latencies (e.g., division). 
We could grow the number of instruction classes, but that probably won't scale 
since each new microarchitecture we want to model would need new instruction 
classes. Similarly, each time we implement a new divider (or some other clever 
gadget), we'd need to implement a custom timing model in C++.


Yeah, I mispoke when I said latencies. What I should have said was that the 
instructions would be classified into groups, and then those groups would be 
matched against by function units in the CPU model instead of using bitfields.

That should work. I think got lost in the latency calculation DSL that is 
actually completely independent of the instruction matcher.


Alternatively you could make your decoder programmable where it would take some 
sort of classifier which would apply groupings in the decoder itself? This 
would also make the CPU model more efficient since the instructions aren't 
going to change groups, but they still get reclassified every time the execute.

That's definitely a possibility, but if we make that classifier a part of the 
C++ world, we effectively encode parts of our timing models in C++. That would 
be highly undesirable and would make it a lot harder to make and distribute new 
custom timing models. This sort of mechanism could work if we make instruction 
classification programmable from Python and add the ability to define custom 
instruction classes in Python (I'm not sure how different it would be from what 
we do currently though). It wouldn't solve the issue for variable latency 
instructions though.


What I was thinking is that instead of having an pseudo ISA independent mechanism 
living in the CPU which is really a second decoder, you could have the same mechanism 
live in the ARM decoder and just tag instructions with groups. So instead of saying 
this unit works with instructions where i & 0xf = 0xa, you'd say instructions where 
i & 0xf = 0xa go in group 2, and function units 1, 3 and 5 act on group 2. Then the 
CPU model is generic since it's just operating on group numbers which are totally 
artificial and independent of ISA, and the ISA specific part (grouping instructions) is 
in the decoder which is already inherently very ISA dependent. By making an instance of 
the decoder programmable, the decoder for cpu X can be set up to group instructions 
different than cpu Y, and in an equivalent way to how cpu X's functional units used to 
claim instructions.

This sounds like a sensible design if we allow additional groups to be defined 
from the configuration script. We could probably do that by defining a 
model-specific range in the OpClass enum. We would also need to have a separate 
decoder and decoder cache instance per CPU instance if to be able to simulate 
multiple microarchitectures (e.g., a bL system) at the same time.

Cheers,
Andreas


IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Non-ISA specific use of ExtMachInst

Reply via email to